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Foreword 


Welcome to Linux Audio Conference 2017 in Saint-Etienne! 


The field of computer music and digital audio is rich of several well-known scientific conferences. 
But the Linux Audio Conference is very unique in this landscape! Its focus on Linux-based (but not 
only) free/open-source software development and its friendly atmosphere are perfect to speak code 
from breakfast to late at night, to demonstrate early prototypes of software that still crash, and more 
generally to exchange audio and music related technical ideas without fear. LAC offers also a unique 
opportunity for users and developers to meet, to discuss features, to provide feedback, to suggest 
improvements, etc. 

LAC 2017 is the first edition to take place in Lrance. It is co-organized by the University Jean Monnet 
(UJM) in Saint-Etienne and GRAME in Lyon. 

GRAME is a National Center for Music Creation, an institution devoted to contemporary music and 
digital art, scientific research, and technological innovation. In 2016 the center hosted 26 guest 
composers and artists, produced 88 musical events and 25 exhibitions in 20 countries. GRAME is the 
organizer of the Biennale Musiques en Scene festival, one of France’s largest international festival of 
contemporary and new music with guest artists ranging from Peter Eotvos, Kaija Saariaho, Michael 
Jarrell, Heiner Goebbels, Michel van der Aa, etc. GRAME develops research activities in the field of 
real-time systems, music representation, and programming languages. Since 1999 all software 
developed by GRAME are open source and in most cases multiplatform (Linux, macOS, Windows, 
Web, Android, iOS, ...). 

UJM is part of the University of Lyon, a consortium of higher-education and research institutions 
located within the two neighboring cities of Lyon and Saint-Etienne. The University of Lyon-Saint- 
Etienne is the main French higher-education and scientific center outside the Paris metropolitan area, 
composed of 4 public universities, 7 high schools (grandes ecoles) and the CNRS (the French 
National Center for Scientific Research), forming a group of 12 member institutions. The University 
of Lyon also assembles 19 associated institutions, offering specific disciplinary training programs. 
Altogether, the Universite de Lyon regroups 137,600 students and 168 public laboratories. 

The Universite Jean Monnet (UJM), founded in 1969, is a comprehensive university enrolling some 
20,000 students, about 15% international students from 111 countries. A high proportion of 
international students come from Africa and Asia. 

The university is composed of 5 faculties (arts, letters and languages, humanities and social sciences, 
law, sciences and technology, and medicine), four institutes: Institut Superieur d’Economie 
d’Administration et de Gestion, Telecom Saint-Etienne, and University Institutes of Technology 
(IUTs) in Saint-Etienne and Roanne. 

The CIEREC is a research center devoted to the field of contemporary expression, which brings 
together professors, researchers and PhD students in aesthetics and sciences of art, plastics arts, 
design, digital arts, literature, linguistics and musicology. Its main field is the arts and literature of the 
twentieth and twenty-first centuries. 

The Music Department offers graduate, post-graduate and doctoral training in music and musi¬ 
cology. In the field of technologies, it proposes, since 2011, a Professional Master's Degree in 
Computer Music (RIM) that is unique in France, in collaboration with GRAME, with the main audio 
production studio of Saint-Etienne (Le FIL) and with the National Superior Conservatory of Music in 
Lyon. 




In 2016, we have created another Professional Master for Digital Arts (RAN). The Professional 
Masters of RIM & RAN are aimed at developing students’ applied knowledge and understanding of 
electronic and digital technologies for the creation and they prepare to the professions of "Producer in 
Computer Music (RIM - Realisateur en Informatique Musicale) and in Digital Arts (RAN - 
Realisateur en Arts Numeriques). These producers are direct actors in musical and artistic 
productions, and they are at the interface between software developers, applied computer scientists, 
composers, artists ... and all people likely to integrate video, image and sound in their activities. Most 
courses are available in English (see http://musinf.univ-st-etienne.fr/indexGB.html). 

Thanks to all the contributors who submitted papers and proposed workshops, installations and music, 
we will have a very interesting and varied program at the conference. We are pleased to welcome, 
over a period of 4 days, some twenty conferences on various subjects with speakers from different 
backgrounds and countries. 

We'd like to thank all those who contribute to the realization of this edition, and especially Albert 
Graf, Philippe Ezequel, Jean-Franqois Minjard, Stephane Letz, Lionel Rascle, Thomas Cipierre, 
Sebastien Clara, Landrivon Philippe, David-Olivier Lartigaud, Jean-Jacques Girardot, Martine 
Patsalis, Nadine Leveque-Lair and all the reviewers and members of the scientific and artistic 
committees. 

Thanks to our partners who helped to finance this conference : the CIEREC, GRAME, Masters RIM 
& RAN, the UJM Music and Arts Departments, The Arts, Lettres, Langues Faculty, Random-Lab at 
ESADSE (Art School of Saint-Etienne), Le Son des Choses, Electro-M, IDjeune, the Commission 
Sociale et Vie Etudiante at TUJM. 

LAC 2017 has been also partially funded by the FEEVER project [ANR-13-BS02-0008] supported by 
the Agence Nationale pour la Recherche. 

We hope that you will enjoy the conference and have a pleasant stay in Saint-Etienne! 


Vincent Ciciliato, Yann Orlarey et Laurent Pottier 




UNIVERSITE 
|EAN MONNET 

SAINT-ETIENNE 
CIEREC - EA 3068 


UNIVERSITE 
lyjj DE LYON 


FACULTE 

ARTS 

LETTRES 

LANGUES 



Le son des choses 


Ecole 
superieure 
d’art 
et design 
Saint-Etienne 
<► 







LAC 2017 Teams 


Organizers 

• CIEREC (Centre Interdisciplinaire d’Etude et de Recherche sur l’Expression Contemporaine), 
director: Daniele Meaux, directors of the Electronic Team: Laurent Pother & Vincent 
Ciciliato 

• Music Department of Jean Monnet University (UJM), director: Anne Damon-Guillot 

• GRAME (National Center for Musical Creation), director: James Giroudon, scientific 
director: Yann Orlarey 

• Random-Lab, Center for Open Researches in Art, Design and New Media at ESADSE (Art 
School of Saint-Etienne), director: David-Olivier Lartigaud 

• Association « Le son des choses » (Acousmatic Music) 

• Association « Electro-M » (Masters RIM RAN students) 

Organizing committee 

• Vincent Ciciliato, Lecturer (Digital Arts) at CIEREC (UJM) 

• Thomas Cipierre, PhD student (Musicology) at CIEREC (UJM) 

• Sebastien Clara, PhD student (Musicology) at CIEREC (UJM) 

• Philippe Ezequel, Lecturer (Computer Sciences) at CIEREC (UJM) 

• Jean-Jacques Girardot, Programmer (Computer Sciences) at Le son des Choses 

• Stephane Letz, Researcher at GRAME 

• Philippe Landrivon, Audiovisual technician, ALL faculty (UJM) 

• David-Olivier Lartigaud, Director of Random-Lab (ESADSE) 

• Jean-Franqois Minjard, Composer at Le Son des Choses 

• Yann Orlarey, Scientific director of GRAME 

• Laurent Pother, Lecturer (Musicology) at CIEREC (UJM) 

• Lionel Rascle, Professor (Musical School - St Chamond & Rive de Giers) 

Scientific committee 

• Fons Adriaensen 

• Marije Baalman 

• Tim Blechmann 

• Alain Bonardi 

• Ivica Ico Bukvic 

• Guilherme Carvalho 

• Vincent Ciciliato 

• Thierry Coduys 

• Myriam Desainte-Catherine 

• Goetz Dipper 

• Catinca Dumitrascu 

• Philippe Ezequel 






• John Ffitch 

• Dominique Fober 

• Robin Gareus 

• Albert Graf 

• Marc Groenewegen 

• Florian Flollerweger 

• Madeline Huberth 

• Jeremy Jongepier 

• Pierre Jouvelot 

• David-Olivier Lartigaud 

• Victor Lazzarini 

• Stephane Letz 

• Fernando Lopez-Lezcano 

• Kjetil Matheussen 

• Romain Michon 

• Frank Neumann 

• YannOrlarey 

• Dave Phillips 

• Peter Plessas 

• Laurent Pother 

• Miller Puckette 

• Elodie Rabibisoa 

• Lionel Rascle 

• David Robillard 

• Martin Rumori 

• Bruno Ruviaro 

• Funs Seelen 

• Julius Smith 

• Pieter Suurmond 

• Harry Van Haaren 

• Steven Yi 

• Johannes Zmolnig 




SUMMARY 


Special Guests 

Paul Davis.-.—.—.—.-...— 1 

Thierry Coduys —...-------1 

Conferences 

1. OpenAV Ctrla: A Library for Tight Integration of Controllers by Harry Van Haaren— 5 

2. Binaural Floss - “ Exploring Media, Immersion, Technology by Martin Rumori-.13 

3. A versatile workstation for the diffusion, mixing and post-production of spatial audio 

by Thibaut Carpentier —.-.—...-...—.-.21 

4. Teaching Sound Synthesis in C/C++ on the Raspberry PI by Henrik Von Coler, 

David Runge.--—--29 

5. Open Signal Processing Software Platform for Hearing Aid Research (openMHA) by 

Tobias Herzke, Hendrik Kayser, Frasher Loshaj, Giso Grimm, Volker Hohmann-35 

6. Towards dynamic and animated music notation using IN Score by Dominique Fober, 

Yann Orlarey, Stephane Letz.----—--43 

7. PlayGuru, a music tutor by Marc Groenewegen...—.-.53 

8. Faust audio DSP language for JUCE by Adrien Albouy, Stephane Letz-61 

9. Polyphony, sample-accurate control and MIDI support for FAUST DSP using 

combinable architecture files by Stephane Letz, Yann Orlarey, Dominique Fober, 
Romain Michon-69 

10. faust2api: a Comprehensive API Generator for Android and iOS by Romain 

Michon, Julius Smith, Stephane Letz, Chris Chafe, Yann Orlarey---77 

11. New Signal Processing Libraries for Faust by Romain Michon, Julius Smith, Yann 

Orlarey--------83 

12. Heterogeneous data orchestration - Interactive fantasia under Supercollider by 

Sebastien Clara —...—.....— 89 

13. Higher Order Ambisonics for SuperCollider by Florian Grond, Pierre Lecomte.95 

14. STatic (LLVM) Object Analysis Tool: Stoat by Mark McCurry--105 

15. AVE Absurdum by Winfried Ritsch--—...— 111 

16. Multi-user posture and gesture classification for "subject-in-the-loop" applications 

by Giso Grimm, Joanna Luberadzka, Volker Hohmann--119 

17. VoiceOfFaust by Bart Brouns-------.127 

18.0n the Development of C++ Instruments by Victor Lazzarini-133 

19. Meet the Cat: Pd-L20rk and its New Cross-Platform Version "Purr Data" by Ivica 
Bukvic, Albert Graef, Jonathan Wilkes---141 
























Posters / Speed-Geeking 


Impulse-Response- and CAD-Model-Based Physical Modeling in Faust (poster) by 
Pierre-Amaury Grumiaux, Romain Michon, Emilio Gallego Arias, Pierre Jouvelot-151 

Fundamental Frequency Estimation for Non-Interactive Audio-Visual Simulations 
(poster) by Rahul Agnihotri, Romain Michon, Timothy O'Brian-155 

Porting WDL-OL to LADSPA (speed-geeking) by Jean-Jacques Girardot--— 159 

Workshops 

. . 161 

Concerts 

169 

Installations 

- 177 









LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


1 


Special Guests 

PAUL DAVIS 

Paul Davis is the lead developer of the open source Ardour digital audio workstation, as well 
as the JACK Audio Connection Kit. Before his 18 years involvement with audio 
software, Paul moved between academia and the corporate computing worlds, including 4-1/2 
years at the University of Washington's Computer Science & Engineering department, and 
then becoming the 2nd employee at Amazon.com . In 2008/2009 he taught at the Technische 
Universitat, Berlin as the Edgar Varese Visiting Professor. Paul normally lives 
near Philadelphia, PA, but can also be found living and working in a solar-powered van. To 
his regret, Paul does not play any musical instruments. 

Talk: 

20 years of Open Source Audio: Success, Failure and The In-Between 

I will talk about the 20-year history of open source audio development (focused on Linux but 
including other platforms when appropriate). It is a story that includes successes, failures and 
a lot of more ambiguous elements. I will discuss the way that the open source model does and 
does not help with software development, and also the sometimes surprising ways that "open 
source" might be pushing audio and music technology in the near future. 

Thierry Codyus 

Artist, musician, new technology expert, Thierry Coduys specializes in collaborative and 
multidisciplinary projects where interactivity meets the contemporary arts. Since 1986, he has 
worked closely with the avant-garde of contemporary music (e.g. Karlheinz Stockhausen, 
Steve Reich, ...) to realize electroacoustic and computer systems for live performance. After a 
few years spent at the IRCAM in Paris, he becomes the assistant to Luciano Berio. Building 
on his experience of the contemporary art scene, he creates his own company in 1999: an 
artistic research and technology laboratory called ‘La kitchen’, where artists from various 
horizons (e.g. music, dance, theatre, video, network) came to develop projects in collaboration 
with the team and where artists were encouraged to use Open Source Software. Thierry, 
among others, has been for more than 15 years the project manager of ‘IanniX’ (GNU GPL3 
application), an interactive software interface, inspired by the UPIC of Iannis Xenakis and 
senior consultant for the development of ‘Rekall 1 (GNU GPL3 application), a video¬ 
annotation software to document digital performances. 

Talk: 

Why could the open source software change the way of writing for contemporary creation? 

I will try to explain how it is important to propose to the artists to use open source’s tools. Lor 
such a long time artists have been obliged to play the game of commerce and industries, and 
also to use dedicated platforms in research and creation centres, they are waiting now for new 
concepts and not only for new production’s tools. Open Source community attracts brilliant 
developers, very motivated and often not very well paid. Many reasons can explain this 
interest, like clean design, liability, easy maintenance in the respect of the rules and values 
shared by the community, but also and mostly its freedom without constrictions that leave 
space to new concepts. I will show some examples of creation in collaboration with artists in 
all phases of development and their impact on the functions of the software which are used in 
the creation process. 



LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


2 



LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


Conferences 



LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 



LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


5 


OpenAV Ctlra: 

A Library for Tight Integration of Controllers 

Harry VAN HAAREN 

OpenAV 

Bohatch, 

Mountshannon, 

Co Clare, Ireland. 
harryhaaren@gmail.com 


Abstract 

Ctlra is a library designed to encourage integra¬ 
tion of hardware and software. The library ab¬ 
stracts events from the hardware controller, emitting 
generic events which can be mapped to functionality 
exposed by the software. 

The generic events provide a powerful method 
to allow developers and users integrate hardware 
and software, however a good development workflow 
is vital to users while tailoring mappings to their 
unique needs. 

This paper proposes an implementation to enable 
a fast scripting-like development workflow utilizing 
on-the-fly recompilation of C code for integrating 
hardware and software in the Ctlra environment. 

Keywords 

Controllers, Hardware, Software, Integration. 

1 Introduction 

Ctlra aims to enable easy integration between 
DAWs and controllers. At OpenAV we believe 
that enabling hardware controllers to be 1st 
class citizens in controlling music software will 
provide the best on-stage workflow possible. 

Ctlra has been developed due to lack of a 
simple C library that affords interacting with a 
range of controllers in a generic but direct way, 
that enables tight integration. 

1.1 Existing Projects 

Although many projects exist to enable hard¬ 
ware access, very few aim to provide a generic 
interface for applications to use. 

Projects such as maschine.rs[Light, 2016], 
HDJD[Pickett, 2017], OpenKinect[OpenKinect- 
Community, 2017] and CWiid[Smith, 2007] all 
enable hardware access, however they each ex¬ 
pose a a unique API to the application, resulting 
in the need to explicitly support each controller. 

The o.io[Freed, 2014] project aims to unify 
communications for various types of interac¬ 
tion using an OSC API, which is similar to 
the generic events concept. Discoverability and 


familiarity with the implementation presented 
possible issues, so Ctlra is designed as a simple 
C API that will be instantly familiar to seasoned 
developers. 

Hence, Ctlra is implemented as a C library 
that provides generic events to the application, 
regardless of the hardware in use. 

1.2 Modern Controllers 

Each year there are new, more powerful and 
complex hardware controllers, often with large 
numbers of input controls, and lots of feedback 
using LEDs etc. The latest generations have 
seen an uptake in high-resolution screens built 
into the hardware. 

The capabilities of these devices require an 
equally powerful method to control the hard¬ 
ware, or risk not utilizing them to the full po¬ 
tential. As such, any library to interface with 
these controllers should afford handling these 
complex and powerful controller devices easily. 

1.3 Why a Controller library? 

Although every application could implement its 
own device-handling mechanism, there are sig¬ 
nificant downsides to this approach. 

Firstly, a developer will not have access to 
all controllers that are available, so only a sub¬ 
set of the controllers will have tight integration 
with their software. As an end result, the users 
controller may not be directly supported by the 
application. 

Secondly, duplication of effort is significant, 
both in the development and testing of the con¬ 
troller support. This is particularly true if a 
device supports multiple layers of controls. 

Thirdly, advanced controller support features 
like hotplug and supporting multiple devices of 
the same type must also be tested - requir¬ 
ing both access to multiple hardware units and 
time. 

The Ctlra library shares the effort required to 
develop support for these powerful devices, pro- 
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viding users and developers with an easy API 
to communicate with the hardware. 

1.4 Tight integration 

The terms “tight integration” or “deep integra¬ 
tion” are often used to describe hardware and 
software that collaborate closely together, per¬ 
haps they are even specifically designed to suite 
one other. 

Tight integration leads to better workflows 
for on-stage usage of software, as it allows oper¬ 
ations from inside the software to be controlled 
by the hardware device and appropriate feed¬ 
back returned to the user. 

The advantage of tight integration is pro¬ 
viding a more powerful way of integrating the 
physical device and the software. As an exam¬ 
ple, many DAWs support MIDI Control Change 
(CC) messages, and allow changing a parame¬ 
ter with it. Although such a 1:1 mapping is 
useful, most workflows require more flexibility. 
For example, each physical control could effect 
a number of parameters with weighting applied 
to provide a more dynamic performance. 

1.5 Controller Mapping 

The Ctlra library allows mappings to be cre¬ 
ated between physical controls and the target 
software. DAWs could expose this functional¬ 
ity for technical users - giving them full control 
over the software. 

Given the variation in live-performances and 
on-stage workflows, there is no ideal mapping 
from a device to the application - it depends 
on the user. As a result, OpenAV is of the 
opinion that enabling users to create custom 
mappings from controllers to software using a 
generic event as a medium to do so is the best 
approach. 

1.6 Scripting APIs 

Various audio applications provide APIs to al¬ 
low users script functionality for their con¬ 
troller. Enabling users to script themselves re¬ 
quires technical skill from the user, however it 
seems like there is no viable alternative. 

The solution proposed in section 4 also pro¬ 
poses “crowd-sourcing” the effort in writing 
controller mappings to the users themselves, as 
they have access to the physical device and have 
knowledge of their ideal workflow. 

Examples of audio applications that pro¬ 
vide scripting APIs are Ardour [Davis, 2017], 
Mixxx[Mixxx, 2017] and Bitwig Studio [Bitwig, 
2017]. Although Ableton Live[Ableton, 2017] 


doesn’t officially expose a scripting API, the are 
members of the community that have investi¬ 
gated and successfully written scripts to control 
it [Petrov, 2017]. 

A brief review shows high-level scripting lan¬ 
guages are favoured over compiled languages. 
Mixxx and Bitwig are both using JavaScript, 
while Ableton Live uses Python, and Ardour 
uses the Lua language. 

These solutions are all valid and workable, 
however they do require that the application 
developer to exposes a binding API to glue the 
scripting API to the core of the application. 

With the exception of Lua, none of the above 
scripting languages provide real-time safety un¬ 
less very carefully programmed - which should 
not be expected of user’s scripts. 

OpenAV feels that providing controller sup¬ 
port in the native language of the application 
ensures that all operations that the application 
is capable of are also mappable to a controller. 
Other advantages of having the controller map¬ 
pings in the native language of the application is 
that they can be compiled into the application 
itself. 

2 Ctlra Implementation 

This section details the design decisions made 
during the implementation of the Ctlra library. 
The core concepts like the context, device and 
events are introduced. 

2.1 Ctlra Context 

The main part of the Ctlra library is the con¬ 
text, it contains all the state of that particular 
instance of the Ctlra library. This state is repre¬ 
sented by a ctlra_t in the code. Using a state 
structure ensures that Ctlra is usable from in¬ 
side a plugin, for example an LV2 plugin. 

Devices and metadata used by Ctlra are 
stored internally in the ctlravt. The end goal is 
to enable multiple ctlra.t instances to exist in 
the same process without interfering with one- 
another. This is more difficult than it sounds 
as not all backends provide support for context 
style usage. 

2.2 Generic Events 

Ctlra is built around the concept of a generic 
event. The generic event is a C struc¬ 
ture ctlra_event_t which may contain any 
of the available event types. The avail¬ 
able event types include all common hardware 
controller interaction types, such as BUTTON, 
ENCODER, SLIDER and GRID. The events are 
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prefixed by CTLRA_EVENT_, so BUTTON becomes 
CTLRA_EVENT_BUTTON. 

Once the type of the event is established, 
the contents of the event can be decoded. The 
generic event has a union around all events, so 
an event must represent one and only one type 
of event. It is expected that the application will 
use a switch () statement to decode the event 
types, and process them further. 

The power of generic events is shown by the 
examples/daemon sample application, which 
translates any Ctlra supported device into an 
ALSA MIDI transmitting device. 

2.2.1 Button 

The button event represents physical buttons 
on a hardware device. It contains two variables, 
id and pressed. The button id is guaranteed 
to be a unique identifier for this device, based 
from 0, and counting to the maximum number 
of buttons. The pressed variable is a boolean 
value set high when the button is pressed by the 
user. 

2.2.2 Slider 

The slider event represents physical controls 
that have a range of values, but the interaction 
is of limited range, eg: faders on a mixing desk. 
The slider has and id as a unique identifier for 
the slider, and floating-point value that repre¬ 
sents the position of the control. The value 
variable range is normalized as a linear value 
from 0. f to 1. f to allow generic usage of the 
event. 

2.2.3 Encoder 

The encoder represents an endless rotary con¬ 
trol on a hardware device. There are two 
types of encoders, which we will refer to as 
“stepped” and “continuous”. Stepped controls 
have notches providing distinct steps of move¬ 
ment, while the continuous type is smooth and 
provides no physical feedback during rotation. 

The stepped controls notify the appli¬ 
cation for each notch moved by setting 
the ENCODER_FLAG_INT, and the delta change 
is available from delta. Similarly the 
ENC0DER_FLAG_FL0AT tells the application to 
read the delta_float value, and interpret the 
value as a continuous control. 

2.2.4 Grid 

The grid represents a set of controls that are log¬ 
ically grouped together, eg: the squares of the 
Push2 controller. The grid event type contains 
multiple variables: id, flags, pos, pressure 
and pressed. 


The id identifies the grid number, allowing 
controllers with more than one grid to distin¬ 
guish between them. The flags allows the 
event to identify which values are valid in this 
event. Currently two flags are defined, BUTTON 
and PRESSURE, there are 14 bits remaining for 
future expansion. 

If the flag GRID_FLAG_BUTTON is set, the 
pressed variable is valid to read, and repre¬ 
sents if the button is currently pressed or not. 
The BUTTON flag should only be set in the de¬ 
vice backend if the state of the grid-square has 
changed, this eases handling events in the appli¬ 
cation. When GRID_FLAG_PRESSURE is set, the 
floating-point pressure variable may be read, 
The pressure value is normalized to the range 
O.f to l.Of. 

2.3 Devices 

In Ctlra, any physical controller is represented 
internally in by a ctlra_dev_t. Devices do not 
appear available to the application directly, but 
instead operations on the device are performed 
through the ctlra_t context. There is an ab¬ 
stracted representation of a device at the API 
level, which the application has access to in the 
eventJiandle() callback. 

The reason that the device is not exposed 
to the application directly is that ownership 
and cleanup of resources becomes blurred when 
hotplug functionality is introduced. Using the 
ctlra.t context as a proxy for multiple devices 
not only simplifies the application handling of 
controllers, but actually helps define stronger 
memory ownership rules too. See section 2.4 
for hotplug implementation details. 

2.3.1 Device Backends 

A device backend is how the software driver con¬ 
nects to the physical device. 

The implementation of the driver calls a 
read() function, which indicates the driver 
wishes to receive data. The backend library 
will send an async read to the physical device, 
and return immediately. Upon completion of 
the transaction a callback in the driver is called 
which decodes the newly received data, and can 
emit events to the application if required. To 
write data to the device, a write () is provided. 

Note that a single device driver may open 
multiple backends, or utilize multiple connec¬ 
tions of the same backend in order to fully sup¬ 
port the capabilities of the hardware. An ex¬ 
ample could be a USB controller that exposes 
both a USB interrupt endpoint for buttons and 
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a USB bulk endpoint for sending data to a high- 
resolution screen. 

Note that more backends can be added to 
support more devices if it is required in future. 

2.4 Hotplug Implementation 

Implementing a hotplug feature is difficult; it 
requires handling device additions and removals 
in the library itself, as well as a method to com¬ 
municate any changes of environment with the 
application. 

As Ctlra is a new library built from the 
ground up, hotplug was a consideration from 
the start as a required feature. As such, the API 
has been influenced by and designed for hotplug 
capabilities. The concept of a ctlra_t context 
that contains devices was introduced to allow 
transparent adding of devices without blurring 
memory ownership rules. 

Hotplug of USB devices is enabled by 
LibUSB, which provides a hotplug callback, 
when a hotplug callback is registered and 
hotplug is supported on the platform. The 
USB hotplug callback is utilized to call the 
accept.device() callback in the application, 
providing details of the controller. The info pro¬ 
vided allows the application to present the user 
with a choice of accepting or rejecting the con¬ 
troller, and if accepted, it will be added to the 
ctlra.t context. 

3 Application Usage of Ctlra 

This section will introduce the reader to the 
steps required to integrate Ctlra into an applica¬ 
tion. Refer to the examples/simple/simple. c 
sample to see a minimal program in action. 

The following steps summarize Ctlra usage: 

1. ctlra_create() 

2. ctlra_probe() 

• Accept controller in callback 

3. ctlra_iter() 

• Handle events in callback 

4. ctlra_exit() 

This creates a single ctlra.t context, probes 
and accepts any supported controller. The ac¬ 
cepted controllers are connected to the particu¬ 
lar context that it was probed from. 

Calling ctlra_iter() causes the event to be 
polled and the application is given a chance 
to send feedback to the device. Finally, 
ctlra_exit() releases any resources and grace¬ 
fully closes the context. 


3.1 Interaction 

The main interaction between Ctlra and the 
application happens in two functions. Events 
from the device are handled in the applica¬ 
tion provided eventJiandle() function, while 
feedback can be sent to a device from the 
feedback_func(). 

These functions are callback functions, and 
they are invoked for each device when the ap¬ 
plication calls ctlra.iter(). 

To understand the events passed between the 
device and the application, please review the 
generic events (Section 2.2), and browse the 
examples/ directory. 

3.2 Controller’s View of State 

Each application has its own way of representing 
its state. Similarly, each controller has its own 
capabilities in terms of controls and feedback to 
the user. Given the specific application state 
and capabilities of the hardware, it is useful to 
create a struct specifically for storing the view 
that the controller has of the application. 

Note that the controller view should be 
tracked per instance of the controller, as users 
may have multiple identical controllers. This 
controller’s instance of the struct is very useful 
for remapping the controls to provide an alter¬ 
nate map when a “shift” key is held down. As 
the struct depends on the application and de¬ 
vice, this problem can not be solved elegantly 
at the library layer. 

Ctlra provides a userdata pointer for each in¬ 
stance which can be purposed for to point to the 
state struct. If the application’s state must be 
accessed from the state-struct, a “back-pointer” 
to the application elegantly provides that. 

The memory for the state struct can be al¬ 
located in the accept_device() callback from 
Ctlra, and the memory can be released in 
when the device is disconnected using the 
remove_device() callback. 

4 Device Scripting in C 

This section describes a solution to providing 
a fast and interactive development workflow for 
scripting mappings between software and device 
using the C language. 

C is typically a compiled and static language, 
not one that comes to mind when discussing dy¬ 
namic and scripting type workflows. Although 
generally accurate, C can be used as a dynamic 
language with certain compromises. The follow¬ 
ing section details how applications can imple- 
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ment a C scripting workflow for users to quickly 
develop “Ctlra scripts”. 

4.1 Dynamic Compilation 

Dynamically compiling C at runtime can be 
achieved by bundling a small, lightweight C 
compiler with your application. This may sound 
a little crazy, but there are very small and 
lightweight C compilers available designed for 
this type of usage. The “Tiny C Compiler”, 
or TCC[Bellard, 2017] project is used to enable 
compiling C code at runtime of the application. 

Please note that the security of dynamically 
compiling code is not being considered here as 
the goal is to enable user-scripted controller 
mappings for musical performance. If security 
is a concern, the reader is encouraged to find a 
different solution. 

4.2 TCC and Function Pointers 

The TCC API has various functions to create a 
compilation context, set includes, and add hies 
for compilation. Once initialized, TCC takes an 
ordinary .c source hie, and compiles it. 

When compilations completes successfully, 
TCC allows requesting functions from the script 
by name, returning a function pointer. 

The returned function pointer may be called 
by the host application, forming the method of 
communicating with the compiled script. 

4.3 The Illusion of Scripting 

To provide the illusion that the code is a script, 
the application can check the modified time of 
a script hie, and recompile the hie if needed. 
By swapping in the new function pointers, the 
update code runs. The old program can then 
be freed, cleaning up the resources that were 
consumed by the now outdated script. 

The examples/tcc_scripting/ directory 
contains a minimal example showing how the 
event handling for any Ctlra supported device 
can be dynamically scripted. 

Providing this workflow requires some extra 
integration from the application, however the 
time pays off easily in developer time saved 
when time save in scripting support for each 
controller is considered. 

4.4 C and C-|—f- APIs 

Note that TCC is a C compiler only - explicitly 
not a C++ compiler. This has some impact on 
how scripts can interact with applications, as 
many large open-source audio projects are writ¬ 
ten in C++. The solution is to provide wrapper 
functions to C, if the hosts language is C++. 


Often real-time software uses message¬ 
passing in plain C structs through ringbuffers. 
This is a good way to communicate between dy¬ 
namically compiled scripts and the host, as it 
provides a native C API, as well as a method to 
achieve thread-safe message passing. 

5 Case Study: Ctlra and Mixxx 

This section briefly describes the work per¬ 
formed to integrate Ctlra with the open-source 
Mixxx DJ software. It is presented here to 
showcase how to integrate the Ctlra library in 
an existing project. 

5.1 Implementation 

This section details the steps taken to integrate 
the Ctlra library in Mixxx to test Ctlra in the 
real-world. 

5.1.1 Class Structure 

Mixxx has a very object oriented design, utiliz¬ 
ing C++ classes to abstract behaviour of control 
devices and managers of those control devices. 
The ControllerManager class aggregates the 
different types of ControllerEnumerator 
classes, which in turn add Controller class in¬ 
stances to the list of active controllers. Ctlra has 
been integrated as a ControllerEnumerator 
sub-class for this proof-of-concept implemen¬ 
tation, really it should be integrated at the 
ControllerManager level. 

5.1.2 Threading in the Mixxx Engine 

The Mixxx engine currently creates many 
threads. This design is supported by the use of 
an “atomic database” of values (see next Section 
5.1.3). Given this design, the Ctlra integration 
is done by spawning a Ctlra handling thread, 
which performs any polling and interacting with 
Ctlra supported devices. 

5.1.3 Communicating with the Engine 

The Mixxx engine is composed of values, which 
can be controlled from any thread anywhere 
in the code. These values are represented in 
the code by ControlObject and ControlProxy 
classes. A ControlObject is the equivalent to 
owning a value, while the ControlProxy allows 
atomic access to update the value. Lookup of 
these values is performed using “group” and 
“key” strings. The strings are constant allowing 
Ctlra and the Mixxx engine to understand the 
meaning of each value represented by a partic¬ 
ular ControlProxy. 
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5.1.4 Mixxx’s C-|—|- API 

An issue arises due to Mixxx having a Control- 
Proxy being a C++ class which is not possible 
to access from a TCC compiled script (refer to 
C and C++ APIs, Section 4.4). 

The solution is to create a C wrapper 
function, which simply provides a C API to 
the desired C++ function to be called on a 
ControlProxy instance. This provides the 
power of the Mixxx engine to the dynamically 
compiled script code: 

void mixxx_config_key_set ( 
consl char *group , 
const char *key, 
float value ); 

5.2 Mixxx and Hotplug 

Since Ctlra hides the hotplug functionality 
from the application due to the design of the 
accept_device() callback, Mixxx supports on- 
the-fly plug-in and plug-out transparently. 

This is achieved by the Ctlra library having 
its own thread to poll events (see Section 5.1.2), 
and handling the connect or disconnect events. 
The Mixxx application code did not have to be 
modified to support hotplugging of controllers 
in any way (beyond adding basic Cltra support). 

5.3 Scripting Controller Support 

With the Ctlra library integrated in Mixxx, 
users are now able to script the tight integration 
of the Ctlra supported hardware and Mixxx. 
The next sections demonstrate simple mappings 
from a device to Mixxx and vice-versa. 

5.3.1 Event Input to Mixxx 

When a user presses a physical control on a de¬ 
vice, the action is presented to the application 
as an event. The user can map these events to 
the application in a variety of ways, in order to 
suit their own requirements on how they wish 
to control the software application. 

For example, the following snippet shows how 
we can bind slider ID 10 to channel 1 volume in 
Mixxx (note the usage of the C function from 
Section 5.1.4): 

case CTLRA_EVENT_SLIDER: 
switch (e—>slider . id ) { 
case 10: 

mixxx_config_key_set ( 

' ' [ Channel 1 ] ' ' , 

' 1 volume ' 1 , 
e—>slider .value ); 
break ; 


5.3.2 Mixxx Feedback to Device 

The reverse of the previous paragraph is to 
send Mixxx state to the physical device, provid¬ 
ing feedback to the user. Each parameter that 
Mixxx exposes via the ControlProxy is avail¬ 
able for reading as well as writing. The allows 
the script to query the state of a particular vari¬ 
able from Mixxx, and update the state of an 
LED on the device, using the Ctlra encoding 
for colour and brightness: 

int play; 

play = nrixxx_config_key_get ( 

' ' [ Channel 1 ] 1 1 , 

1 1 play-indicator ' '); 

led = play > 0 ? Oxffffffff : 0; 

ctlra_dev_light_set (dev , 

DEVICE_LED_PLAY, 
led ); 

6 Future Work 

To make Ctlra a ubiquitous library for event 
I/O is a huge task, however the benefit to all 
applications if such a library did exist would be 
huge too. 

Imagine easily scripting your DIY controller 
to easily control any aspect of any software - 
huge potential for customized powerful user- 
experience. OpenAV intends to use the Ctlra 
library and integrate it with any projects that 
would benefit from a powerful customizable 
workflow. 

6.1 Device Support 

At time of writing, the Ctlra library supports 
6 advanced USB HID devices, one USB DMX 
device, a generic MIDI backend, and plans are 
in place to support a common bluetooth console 
controller - but more must be added to make the 
Ctlra library really useful! 

An interesting angle may be so that DIY plat¬ 
forms like Arduino can be used to build con¬ 
trollers that use a generic Ctlra backend, allow¬ 
ing controllers to be auto-supported. 

The previously mentioned hardware enabling 
projects that provide access to specific hardware 
devices could be integrated with Ctlra, trans¬ 
parently benefiting applications that use Ctlra. 

The number of supported hardware devices is 
paramount to the success of the Ctlra library, so 
OpenAV welcomes patches or pull-requests that 
add support for a device. 
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6.2 Software Environments 

From the software point-of-view there is huge 
potential for integrating into existing software. 

For example mapping Ctlra events to LV2 
Atoms would expose the Ctlra backends to any 
LV2 Atom capable host. 

Integration with DSP languages like FAUST 
or PD may prove interesting and allow for faster 
prototyping and more powerful control over per¬ 
formance using those tools. 

Hardware platforms like the MOD 
Duo[MOD, 2017] could use the Ctlra li¬ 
brary to enable musicians to use a wider 
variety of controllers in tlrier on-stage setups in 
conjunction with the DSP on the DUO. 

7 Conclusion 

This paper presents Ctlra, a library that allows 
an application to interface with a range of con¬ 
trollers in a powerful and customizable way. 

It shows how applications and devices can in¬ 
teract by using generic events. A case study 
showcases integrating Ctlra with the open- 
source Mixxx project as a proof of concept. 

To enable a fast development workflow for 
creating mappings between applications and de¬ 
vices, a method to dynamically compile C code 
is introduced. This enables developers and users 
to write mappings between devices and appli¬ 
cations as if C was a scripting language, but 
provides native access to the applications data 
structures. 

Ctlra is available from github here[OpenAV, 
2017], please run the sample programs in the 
examples/ directory of the source to experience 
the power of Ctlra yourself. 
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Abstract 

Technology for binaural audio, that is, relating two 
audio signals to the psychophysical properties of the 
human hearing apparatus, is capable of recording, 
synthesising and reproducing the spatial informa¬ 
tion of an auditory environment comprising an im¬ 
mersive quality. While current scholarly research on 
binaural rendering and reproduction techniques for 
personal, mobile and interactive audio augmented 
environments is well advanced, their grounding with 
respect to the aesthetic experience in an integral lis¬ 
tening act is not. Based on the case study of an 
intermedia installation, Parisflaneur , an attempt to¬ 
wards the exploration and reflection of binaural me¬ 
dia properties is made. Here, a special emphasis is 
put on the role of FLOSS tools in an arts-based re¬ 
search context. 

Keywords 

binaural audio, immersion, floss tools, intermedia 
art, field recordings 

1 Introduction 

Binaural audio means to relate a pair of audio 
signals to the psychophysical properties of the 
human hearing apparatus, that is, the signals 
are regarded as so called ear signals. Binaural 
audio is among the earliest attempts of record¬ 
ing, reproducing and synthesising the spatial in¬ 
formation of an auditory scene by dummy head 
microphones, appropriate signal processing and 
by presenting the binaural signal pair isolated 
from each other to the left and right ear, re¬ 
spectively, usually via headphones. Nowadays, 
in the view of ubiquitous headphone use and the 
advent of widespread three-dimensional video 
projection, binaural technology constantly gains 
significance, and so does research on the optimal 
rendering and projection of personal, mobile and 
interactive audio augmented environments. 

When it comes to the creation of such environ¬ 
ments, optimisation targets become much less 


clear. Questions of immersion, perception and 
cognition arise as components of an integral aes¬ 
thetic experience. Methods in scholarly research 
usually segment complex processes such that, 
for instance, certain psychoacoustic parameters 
are isolated for separate investigation. The re¬ 
sults of listening tests according to such meth¬ 
ods often cannot be generalised for regarding a 
complex listening process that involves musical 
or anecdotal aspects of the sound material, cog¬ 
nitive contribution or previous experience by the 
listeners, to name just a few factors. 

Obviously, this paper cannot provide solu¬ 
tions or answers. What I am going to present 
is a personal attempt of approaching theoret¬ 
ical, aesthetic and engineering reflections along 
the development of an artistic case study, Paris¬ 
flaneur , which is work in progress. 

In the next section, I will describe the case 
study from a phenomenological point of view, 
that is, how it appears to the visitor of an imag¬ 
ined exhibition. The description will be followed 
by a detailed discussion of technical implemen¬ 
tation decisions in close relation to aesthetic re¬ 
flections on conditions of the media involved. A 
special emphasis will be put on the role of Free 
and Libre Open Source Software (FLOSS) in the 
described process. 

2 Parisflaneur: visitor’s experience 

Parisflaneur is a sound installation that ex¬ 
plores the relation of binaural recording and bin¬ 
aural rendering of a virtual scene by providing 
a reactive, playful environment. 

From the outside, the appearance of Paris¬ 
flaneur is quite reduced: it does not consist of 
much more than a pair of headphones and an 
empty area in space of about twenty to fourty 
square meters. The visitor is invited to put 
on the headphones and explore the installation 
solely by listening and freely moving in the area 
whose boundaries are usually marked on the 
floor. 
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Both the position and orientation of the head¬ 
phones are tracked, which to date requires an 
optical multi-camera tracking, given the re¬ 
quired latency limits and the relatively large 
tracking volume. That means that a tracking 
target, a rigid body of four or five reflective balls, 
is a quite noticeable part mounted on top of 
the headphones F] Additionally, in most practi¬ 
cal installations of Parisflaneur the headphones 
are cabled as no satisfying wireless solution with 
respect to transmission quality and robustness, 
low latency and signal dynamics (i.e., no audio 
compression) was available so far. This fact is 
mentioned as it potentially interferes with the 
visitor’s mobility (see |Rumori, 2017|). 



Figure 1: Visitor exploring Parisflaneur. 

When the listener enters the installation, he 
is presented a virtual auditory scene, which can 
be navigated. Urban and rural situations such 


X A promising alternative is presented by the Light¬ 
house system developed for the HTC Vive goggles and 
to be released soon as an independent tracking solution. 
It shall provide a nearly comparable performance to 
camera-based systems by OptiTrack or Vicon at a 
much lower cost and setup complexity, cf. http://www. 
roadtovr.com/valve-sell-base-stations-directly- 
lower-barrier-steamvr-tracking-development/ (last 
retrieved February 27, 2017). 


as a street, pedestrian area, or park are recog¬ 
nisable by typical sounds like cars, footsteps, 
voices, crickets, an aeroplane or rain. They ap¬ 
pear to come from different directions around 
the listener. When walking around guided by 
listening it turns out that each of the sound sit¬ 
uations is fixed at a certain location in space. 
Their positions may be found by bodily move¬ 
ment, approaching, turning towards and away 
from the sounds. They react by loudness attenu¬ 
ation and filtering on increasing distance and di¬ 
rectional changes relative to the listener’s head, 
compensating his movements and thus result¬ 
ing in a perceived steady configuration inscribed 
into the surrounding space. When the listener 
reaches exactly the same location as a sound sit¬ 
uation, it appears to reside inside his head. This 
auditory effect is a common experience when lis¬ 
tening to speaker-based stereophonic signals on 
headphones. In total, there are seven of such 
sound spots representing different everyday sit¬ 
uations in Parisflaneur. 

When the location of a sound situation was 
found, the listener may “enter” it by performing 
a ducking gesture, that is, by bending down such 
that the head goes well below the usual stand¬ 
ing or walking height and subsequently raising 
the head again at the found location. This pro¬ 
cedure is communicated to the visitors before¬ 
hand using the metaphor of tracing “sonic hats” 
in space which can be “put on” and “taken off.” 

Entering a sound situation yields a substan¬ 
tial change in the audio listened to. The vir¬ 
tual sound scenery composed of multiple anec¬ 
dotal situations gradually disappears except of 
the single sound being entered. The remaining 
one is no longer represented by a single spot but 
opens towards a rich, expanded auditory scene 
on its own that immerses the listener. Tech¬ 
nically, the rendered binaural signal is replaced 
by a static binaural recording, which also serves 
as a basis for the sound sources in the virtual 
scene. As the recording is static, it does not 
any longer respond to the listener’s movements 
but is attached to his head, as known from the 
common listening experience with headphones. 
In terms of the above-mentioned metaphor, the 
“sonic hat” that has been “put on” is now “car¬ 
ried around.” 

The entered sound situation may be left by 
performing the ducking gesture once more: by 
bending down and coming up again from un¬ 
derneath the sound spot, thus “taking off” the 
“sonic hat” and leaving it in space. The binaural 
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recording crossfades into the virtual scene again 
comprising all of the seven sound sitations, each 
represented by a single point in space. 

When a sound situation was left, it remains 
at the location in space where it was dropped. 
That means, when the listener moves with a 
“sonic hat” currently “put on” the spatial config¬ 
uration of the virtual scene is rearranged. There 
is no immediate audible feedback hinting at this 
change as in this moment solely the entered situ¬ 
ation is presented in its non-reactive form. Only 
after having re-entered the virtual scene, the re¬ 
configuration becomes audible. 

Tracing, entering, leaving and rearranging ev¬ 
eryday sound situations shall allow for a play¬ 
ful exploration of the sound material and an 
associative recombination of narratives in the 
sense of anecdotal music as coined by Luc Fer¬ 
rari. Along with the perceptual differences of 
rendered and reactive audio at one hand and 
recorded and static material on the other, Paris¬ 
flaneur is at the same time a study of the prop¬ 
erties and conditions of so called immersive spa¬ 
tial media. 

3 Media, software, technology 

Several kinds of media technology are involved 
in the realisation of Parisflaneur , among them 
audio recording and reproduction over head¬ 
phones, optical tracking of position and rota¬ 
tion, the application logic that evaluates the 
tracking data and finally controls different lev¬ 
els of signal processing for creating the presented 
output. The focus in this paper is on the last- 
mentioned building blocks that are represented 
by software, which form a major part of the 
artistic development. 

I am going to discuss the implementation with 
a special emphasis on the application of FLOSS 


tools and their relation to the artistic creation 
process and aesthetic aims. Like in most other 
intermedia artefacts, the general purpose com¬ 
puter acts as a kind of meta-medium that, ow¬ 
ing to universal digital data representation, al¬ 
lows for the actualisation of more specific me¬ 
dia machines by means of software |Manovich 


2013 . It shall be stressed though that all the 


other media involved, including and especially 
non-digital ones, have an equally significant in¬ 
fluence on the aesthetics of the work (for a dis¬ 
cussion, see (Rumori, 2017]). 

In the following, I will present technical con¬ 
siderations in stretto with reflections on the 
artistic process and on the properties of media. 


3.1 Software involved 


Parisflaneur is realised by combining a few soft¬ 
ware building blocks, all of them being FLOSS. 
The processing of the tracking data, most of the 
signal processing and the application logic is im¬ 
plemented in Supercollider^ q Supercollider al¬ 
lows for constructing modular multichannel re¬ 
altime signal processing networks controlled by 
a general-purpose object-oriented language. De¬ 
tails on the binaural rendering will be presented 
in the following sections, which will also make 
clear why an open and flexible framework like 
Supercollider is necessary for developing this in¬ 
stallation, rather than a monolithic, optimised 
software package (cf. |Magnusson, 2008]). 

Most binaural synthesis techniques involve a 
matrix of realtime convolutions, sometimes us¬ 
ing room impulse responses of several seconds 
duration. In earlier versions, Parisflaneur uses 
24 binaural room impulse responses (BRIR) of 
64 k samples each, in a later version 12 of those 
BRIRs plus 36 free-held two-channel responses 
of 512 samples. For performing the convolution, 
Jconvolver by Fons Adriaensen is used | Ad ri- 


aensen, 2006b . It provides very efficient, low- 


latency, multi-threaded convolution while ma¬ 
trices of any layout and of large sizes may be 
configured. Supercollider and Jconvolver are 
connected via the Jack Audio Connection KiB 
The binaural room impulse responses (but 
not the free-held ones) used for the convolution 
were measured in the Cube laboratory at In¬ 
stitute of Electronic Music and Acoustics Graz 
(IEM) |Runrori et ah, 20101. For the nreasure- 


3 http://supercollider.github.io (last retrieved 
February 28, 2017) 

3 http://www. jackaudio . org (last retrieved Febru¬ 
ary 28, 2017) 
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ments, a customised version of Alibi , again by 


Fons Adriaensen, was used Adriaensen, 2006a 


Customisations include a higher number of sup¬ 
ported channels and some automation facilities, 
which were used in conjunction with Supercol¬ 
lider [Hollerweger and Rumori, 2013]. 

Editing and processing of the field recordings 
was performed using ArdouiQ 


3.2 Binaural rendering 

At first glance, the rendering of the virtual au¬ 
ditory scene in Parisflaneur seems to be a stan¬ 
dard engineering problem of moderate complex¬ 
ity, which is a correct assumption to a large ex¬ 
tent. There are seven monaural point sources, 
not too many, each with the same trivial, that 
is, omni-directional radiation pattern, to be ren¬ 
dered in a so far not further specified virtual 
space, probably not requiring a too complex un¬ 
derlying model. The scene should be rendered 
for one dynamically moving listener according to 
tracking data input. The sound sources are not 
dynamically moving, and if so, their movements 
are not audible at the same time, which might 
allow for non-realtime optimisations. Further¬ 
more, only one source is moving at a time. 

Despite its moderate technical demands, 
Parisflaneur is not about developing or using an 
“optimal” binaural rendering technique. In fact, 
the artistic reflection is targeted at the ques¬ 
tion of what “optimal” could actually mean in 
this context. Does it mean to model as accu¬ 
rately as possible the physical sound propaga¬ 
tion starting from the emitters, the contribution 
of the surrounding space to the radiated sound 
waves, their arrival at the human head, finally 
the effect of a two-channel, spaced and individu¬ 
ally filtered pressure receiver array, our hearing 
apparatus? In other words: does it mean to 
capture the physics of an existing or imagined 
real-world situation and simulate it? 

Obviously, there is no corresponding real- 
world situation to seven spatial field recordings 
reduced to monaural signals and put into a nav¬ 
igable virtual space. Potentially, an installation 
of seven loudspeakers distributed in space and 
each playing back one of the recordings could 
come close physically but the mere thought ex¬ 
periment makes evident that the artistic point 
would be entirely missed. Although the naviga¬ 
tion aspect may be retained in principle, each of 
the sound spots would be represented by a phys¬ 
ical object, being both an obstacle for moving in 

4 http://ardour. org (last retrieved April 3, 2017) 


space and a hindrance for the orientation by lis¬ 
tening due to its visual presence. Apart from 
that, such an installation would lack the reac¬ 
tive capability of entering one of the recordings. 

I wrote that the envisioned “hardware” repli¬ 
cation of the virtual scene “could come close 
physically” and “may be retained in principle ” 
to indicate that the rendered scene and its phys¬ 
ical counterpart have nothing in common in 
terms of sound propagation properties and reac¬ 
tive behaviour. There is no evidence whatsoever 
why the virtual scene should be designed such 
that its acoustical properties match those of re¬ 
ality. Rather its perception and cognition, that 
is, its integral aesthetic experience, shall pro¬ 
voke an imagination that supports the further 
engagement with the artwork. Aesthetic expe¬ 
rience depends on previously made experience. 
In the case of navigating an auditory environ¬ 
ment it relates to our spatial awareness which to 
date is mostly trained by orientation in reality. 
Again, this does not mean that matching physi¬ 
cal stimuli are sufficient or the right way at all to 
evoke matching auditory impression. In Paris¬ 
flaneur , probably among many other examples, 
it is not even desired |Rumori, 2016], 


This basic assumption [...], that a 
subject will always hear the same 
sound when exposed to identical sound 
signals, is obviously not true [...]. Yet 
[...], authentic reproduction is rarely 
required. [. .. Sjound material on the 
radio and on disk is processed in such 
a way as to achieve the optimal audi¬ 
tory effect, for instance, from an artis¬ 
tic point of view. |Blauert, 1997, 374] 


Blauert does not elaborate on how “the opti¬ 
mal auditory effect” would be approached and 
when it is reached. For a reason: processed 
sound material is only one part of an inte¬ 
gral aesthetical experience; individual percep¬ 
tion, various levels of familiarity with certain 
technologies, subjective cognitive contribution, 
cultural differences are others. From the “artis¬ 
tic point of view” there is no clear optimum ei¬ 
ther: artworks open perceptual spaces for in¬ 
dividual exploration and offer a multitude of 
strands for interpretation. Of course, a kind of 
“aesthetic nucleus” can be assumed that is cen¬ 
tral to both the artist’s and the recipient’s re¬ 
flection. There may be more or less appropriate 
ways of grasping and conveying it using media, 
but a single optimal one is unlikely to exist. 
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Due to the absence of compulsory realisation 
schemes in an artistic context, the rendering 
techniques adapted for Parisfianeur , the sound 
propagation laws modelled in and the rules of 
reactive behaviour applied to the virtual envi¬ 
ronment are found by experimentation and in¬ 
tuition. The reference are not ear signals in 
reality but the conditions and implications of 
media. Here, this includes limitations of space 
and tracking capability, computational power 
and implementation feasibility, and, most im¬ 
portantly, the cultural technique of headphone 
listening and its heritage (for a discussion on the 
latter, see jRumori, 2017]). 

3.2.1 Virtual Ambisonics 


Earlier implementations of Parisfianeur use a 
modified virtual Ambisonics approach for ren¬ 
dering the binaural scene |Noisternig et ah, 
20031. Instead of synthesised room acous¬ 
tics and free-held (i.e., anechoic) impulse re¬ 
sponses, measured binaural room impulse re¬ 
sponses (BRIR) are used. This way, the vir¬ 
tual scene is embedded in captured real-room 
acoustic properties rather than a simplified 
model. Furthermore, the BRIRs were measured 
in the location of the work’s first presentation, 
the Cube laboratory at IEM Graz, such that 
the virtual acoustics presented via headphones 
matched that of the surrounding real space. The 
idea was to provoke the notion of an overlay in¬ 
scribed into the existing aural space rather than 
replacing it by a different one. 

One disadvantage of combining the virtual 
Ambisonics approach with BRIRs is that the 
proposed rendering optimisations cannot be ap¬ 
plied unless the measured room acoustics is as¬ 
sumed to be fully symmetric (cf. [Noisternig et 


ah, 2003~]). More significantly, the implementa¬ 


tion is “incorrect” in terms of communications 
engineering: As the BRIRs were only measured 
for a single orientation of the dummy head, ro¬ 
tation in the Ambisonics domain upon tracking 
input results in the room acoustics being turned 
along with the listener while the relative source 
positions are correctly adjusted. The resulting 
misleading spatial cues may degrade localisa¬ 
tion accuracy and externalisation (cf. |Rumori, 
2017]). 


The virtual Ambisonics approach has been 
incorporated in Parisfianeur using modified 
classes of the AmblEM Supercollider quarl0 


J https://github.com/supercollider-quarks/ 
AmblEM (last retrieved February 28, 2017) 



Figure 3: Block diagram of Parisfianeur using 
the virtual Ambisonics approach. 


3.2.2 Distance model 

Classical Ambisonics does not encode distance 
information of sound sources, only directions, 
that is, sources are plane waves. Extensions ex¬ 
ist to take into account the near field effect of the 
projection system by appropriate filters [Daniel, 
2003 or to use an additional Ambisonics chan¬ 
nel for encoding source distances jPenha, 2008 


Still, these do not include models for translating 
a distance vector into processing parameters like 
amplitude attenuation, low-pass filtering, or the 
ratio of direct signal to reverb energy. 

Scholarly research shows that auditory dis¬ 
tance estimation is highly dependent on the 
source material and cannot be reliably per¬ 
formed even in reality |Zahorik, 2002]. In the 
light of the reflection above (see section 3.2), 
modelling the source distance in a rendered 
scene is not a means of referring to reality but 
to the aesthetic framework of the installation. 

Amplitude attenuation in Parisfianeur is 
much stronger than in reality as described by 
the inverse squared law. Otherwise, the seven 
sound situations would not be distinguishable 
at all by approaching one or the other as their 
levels would differ too little, given the limited 
tracking volume and the relatively low maxi¬ 
mum distances of sources. Similarly, low-pass 
filtering by air absorption would be hardly no¬ 
ticeable at such short distances, while in Paris¬ 
fianeur it is used as an acoustical “magnifier” for 
the closer surrounding of the listener. 
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In implementations of Parisflaneur using the 
virtual Ambisonics approach (see section 3.2.11, 
the ratio of direct and reverb signal energy is 
fixed by the impulse responses. It could be made 
variable by an implementation using two Am¬ 
bisonics domains, a “dry” and a “wet” one. 

3.2.3 Circular panning 

More recent implementations of Parisflaneur 
drop the virtual Ambisonics approach as most 
of its advantages do not apply here. Different 
to the three-dimensional Ambisonics approach, 
the rendering was additionally reduced to two 
dimensions. Listeners in Parisflaneur mostly 
move in a plane only, while the third dimen¬ 
sion has no orienting function. The sound spots 
are meant to be at ear level all the time, in¬ 
dependent of the height of the listener. When 
applying the “sonic hat” metaphor (see section 
[2j, elevation information could have a certain 
value, but this interaction scheme was also re¬ 
placed by a different one in later versions of the 
installation (see |Rumori, 2017 ). 

Currently, Parisflaneur incorporates two do¬ 
mains of simple circular panning, implemented 
using Supercollider’s PanAz unit generator. One 
domain uses 12 channels of a measured circular 
loudspeaker array in a fairly reverberant room, 
while the second one has 36 output channels rep¬ 
resenting a ten-degree resolution of anechoic im¬ 
pulse responses taken from the SoundScapeRen- 
derer project]^] Sources in the far field are pro¬ 
jected using the first panning domain while the 
energy contribution is gradually shifted towards 
the second domain for closer sources. Obviously, 
the latter represents a stronger direct portion of 
the source signal. 

For sources very close to the listener’s head, 
the binaural domain in a classic understand¬ 
ing is left, that is, no head-related impulse re¬ 
sponses are involved anymore. Instead, the usu¬ 
ally undesired effects of intensity panning on 
headphones are exploited for provoking near¬ 
field and in-head experiences (see section 3.3.31. 

3.3 Applications of binaural recordings 

What does it mean to represent a spatial, head- 
related field recording by a monaural single¬ 
point object in a virtual auditory scene? Simi¬ 
lar to sound stored on tape, a vinyl record or a 
compact disc, the recording becomes an object 
in terms of the environment, be it a physical car¬ 
rier medium or a sound source rendered in vir- 

'http://spatialaudio.net/ssr/ (last retrieved 
February 27, 2017) 


3.3.3 


3.2.1 


tual space. This is different from simply playing 
it back, which rarely focuses the recording me¬ 
dia itself, rather, its properties shall be hidden 
behind the recorded. In Parisflaneur , the re¬ 
lation of the recording in its head-related form 
and its appearance as a virtual object is a cen¬ 
tral point of reflection, plus the anecdotal, that 
is, musical relation of several of such objects to 
each other by providing them for rearrangement 
by the listener. 

While the recordings are left widely unpro¬ 
cessed for their binaural presentation when a 
sound situation is “entered,” their monaural 
counterparts as objects in the virtual scene have 
to be derived from the recordings with some 
treatment. 

3.3.1 Monaural representation 

An important point is to achieve some degree of 
monaural compatibility in order to reduce comb 
filter effects especially in the lower frequencies 
when mixing both channels of a binaural record¬ 
ing to a single one. 

A simple monaural representation would only 
use one channel of the binaural recording, and in 
fact that has been done in preliminary versions 
of Parisflaneur. Of course this results in an un¬ 
balanced spatial interpretation of the signal, as 
the higher frequency portions at the far side that 
are attenuated by the head are omitted. Nev¬ 
ertheless, for providing an overall impression of 
a field recording and its recognition in a virtual 
scene this solution may suffice. 

A more advanced approach to monaural com¬ 
patibility would be to turn the phase differences 
in the low frequencies into level differences. This 
is exactly the purpose of the so called Blum- 
lein Shuffler [Gerzon, 19941, “the greatest for¬ 
gotten invention in audio engineering’]^] It was 
patented by Alan Blumlein in 1933 for the loud¬ 
speaker reproduction of time-of-arrival stereo¬ 
phonic signals. 

In Parisflaneur , the Blumlein Shuffler imple¬ 
mentation blsl by Fons Adriaensen is useq^] It 
provides one of the few accessible implementa¬ 
tions, the most advanced one due to its use of 
carefully designed FIR filters and, to my knowl¬ 
edge, the only free and libre implementation. 


'http: //wm .pspatialaudio. com/blumlein_delta. 
htm (last retrieved February 27, 2017) 

"http://kokkinizita.linuxaudio.org/ 
linuxaudio/zita-blsl-doc/quickguide .html (last 
retrieved February 27, 2017) 
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3.3.2 Frequency response 

A virtual source’s spectrum is likely to be dis¬ 
torted by rendering compared to that of the un¬ 
derlying binaural recording. As both instances 
are related to each other in the installation, the 
signals used as virtual sources are filtered ac¬ 
cording to experimental exploration of different 
spatial constellations, that is, different rendered 
directions and distances from the listener. 


3.3.3 Transition design 

The moment of transition from the virtual scene 
to the binaural recording and back is one of the 
central aesthetic experiences in Parisfl,dneur , 
hence the importance of its design. In the course 
of refining the work, transition design evolved 
from a simple cross-fade between the two do¬ 
mains, nevertheless using special overlapping 
curves, towards a more complex multi-stage pro¬ 
cess. 

In the phenomenological description (see sec¬ 
tion [2| I stated that in-head localisation in the 
virtual scene is desired in order to indicate the 
exact position of a sound spot. Early imple¬ 
mentations of Parisflaneur used the virtual Am- 
bisonics approach for the binaural rendering of 
the scene (see section 3.2.1). For closer sources, 


the energy contribution of higher Ambisonics or¬ 
ders is gradually reduced after encoding, which 
achieves a spatial widening until only the zeroth 
order remains when the position of the source 
is reached. This corresponds to an omnidirec¬ 
tional receiver pattern at the listener’s position, 
hence the source’s signal is projected equally 
from all directions in the virtual Ambisonics 
speaker setup. In a certain understanding, this 
might represent the notion of being “inside” a 
sound source, especially in the case of real loud¬ 
speaker reproduction and when the source is at¬ 
tributed a certain extension, for instance, that 
of the reproduction space. 

For the binaural projection of Parisflaneur 
and its narrative, another approach to convey¬ 
ing the “inside” notion appears to be much more 
appropriate: the often undesired in-head lo¬ 
calisation of loudspeaker-based stereophony or 
monaural signals presented on headphones. Its 
application means leaving the integrity of both 
binaural playback and binaural rendering in 
a strict sense of communications engineering. 
Rather, signals usually not considered binau¬ 
ral are interpreted as ear signals in order to ex¬ 
ploit the resulting, yet uniquely binaural effect. 
For this reason, I do not attribute the quality 
“binaural” to a signal pair because of its techni¬ 


cal properties such as the presence of interaural 
time or level differences but rather due to its 
intentional interpretation as ear signals. Fur¬ 
thermore, this example is a strong indication 
why open software systems are a precondition 
for pursuing the artistically motivated approach 
to binaural technology as described here. Most 
monolithic implementations, even if advanced 
and optimised with respect to latest research, do 
not allow for modelling and accessing the signal 
path at every level. 

When experimenting with the above- 
mentioned Blumlein Shuffl,er (see section 3.3.11, 
I noticed that its output provides a perceptual 
bridge between monaural in-head localisation 
and binaural externalisation. Some features are 
retained from the originating binaural signal 
allowing for a partial externalisation, while 
others, due to their monaural compatibility, 
enable panpot-like processing for achieving a 
variable in-head stereo width. In later imple¬ 
mentations of Parisflaneur , such a Blumlein 
shuffled stereo signal is used as an intermediate 
transition phase for gradually opening the 
monaural in-head spot, until the listener’s head 
is “left” by fading into the immersive binaural 
recording. 

4 Conclusion 

In this paper, I presented an intermedia instal¬ 
lation of mine called Parisflaneur. It takes place 
in auditory space which is presented binaurally 
via headphones. The work incorporates seven 
urban and rural sound situations arranged in a 
virtual scene that is navigable by bodily motion 
and orientation by listening. Upon interaction, 
each of the sound situations can be entered, that 
is, the virtual scene can be left in favour of the 
original static, binaural recording of that situa¬ 
tion. Subsequent movements do not allow for a 
further navigation within the situation, instead, 
the virtual scene will be rearranged, which be¬ 
comes audible only after having left again the 
static recording. 

I described in detail the visitor’s experience of 
the installation and realisation alternatives us¬ 
ing FLOSS tools. By doing so, I tried to relate 
technical implementation details to both com¬ 
mon approaches as suggested by scholarly re¬ 
search and to alternative findings driven by aes¬ 
thetic reflection and artistic experimentation. 
One of my central arguments is that the design 
of virtual audio environments always has to ref¬ 
erence the aesthetic experience and the condi- 


3.3.1 




LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


20 


tions of their reception rather than explicit or 
implicit real-world situations. The presentation 
of spaces by transforming media such as binau¬ 
ral audio technology is not a real-world experi¬ 
ence in the sense of sound propagation directly 
and solely through air. 

I aimed at pointing out that FLOSS tools are 
a precondition for artistic engineering as per¬ 
formed in the presented project. As any given 
approach or process is subject to critical reflec¬ 
tion and potential modification, the implemen¬ 
tations involved have to be accessible anywhere 
in the signal path and at any level that turns 
out to be appropriate. Neither would it be pos¬ 
sible for me (and probably for any artist) to im¬ 
plement all the building blocks myself that re¬ 
quire a deep access to their inner mechanisms, 
nor would monolithic and closed software allow 
for entangling artistic quest, aesthetic reflection 
and engineering ambition as attempted to ex¬ 
emplify in this paper. 
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Abstract 

This paper presents a versatile workstation for the 
diffusion, mixing, and post-production of spatial 
sound. Designed as a virtual console, the tool pro¬ 
vides a comprehensive environment for combining 
channel-, scene-, and object-based audio. The in¬ 
coming streams are mixed in a flexible bus archi¬ 
tecture which tightly couples sound spatialization 
with reverberation effects. The application supports 
a broad range of rendering techniques (VBAP, HOA, 
binaural, etc.) and it is remotely controllable via the 
Open Sound Control protocol. 

Keywords 

sound spatialization, mixing, post-production, 
object-based audio, Ambisonic 

1 Introduction 

This paper presents a port of the panoramix 
workstation to Linux. First, we give a brief 
presentation of panoramix and typical use-cases 
of this environment. Then, we present some 
recently added features and discuss the chal¬ 
lenges involved with porting the application to 
the Linux OS. 


Panoramix is an audio workstation that was 
primarily designed for the post-production of 
3D audio materials. The needs and motivations 
for such tool have been discussed in previous 
publications |Carpentier, 2016 Carpentier and| 


Cornuau, 2016 : panoramix typically addresses 


the post-production of mixed music concert^] 
where the sound recording involves a large set 
of heterogeneous elements (close microphones, 
ambient miking, surround or Ambisonic micro¬ 
phone arrays, electronic tracks, etc). During 
the post-production stage, the sound engineers 
need tools for spatializing sonic sources (e.g., 


x The practical use of the software in such a context 
has also been demonstrated in the above-mentioned pub¬ 
lications, through the case study of an electro-acoustic 
piece by composer Olga Neuwirth. 


spot microphones or electronic tracks), encod¬ 
ing and decoding Ambisonic materials, adding 
artificial reverberation, combining and mixing 
the heterogeneous sound layers, as well as 
rendering, monitoring and exporting the final 
mix in multiple formats. Panoramix provides 
a unified framework covering all the required 
operations, and it allows to seamlessly integrate 
all spatialization paradigms: channel-based, 
scene-based, and object-based audio. 

Besides post-production purposes, panoramix 
is also suitable for the diffusion of sound in 
live events since the audio engine operates in 
realtime and without latencyjj Indeed, it has 
recently been used by sound engineers and 
computer musicians in order to control the 
sound spatialization for live productions at 
Ircam. 

2 Architecture 

The general architecture of the workstation has 
been presented in previous work [Carpentier, 
20161. In a nutshell, the panoramix signal flow 
consists of input tracks which are sent to busses 
dedicated to spatialization and reverberation 
effects. All busses are ultimately collected into 
the Master strip, which delivers the signals to 
the output audio driver. Each channel strip 
in the workstation comes with a set of specific 
DSP features. 

One major improvement of the new version 
herein presented is the introduction of “parallel 
bussing”. Namely, this means that each track 
can be sent to multiple busses in parallel^ The 
benefit of such parallel bussing architecture is 

2 Only a few specific DSP treatments may induce a la¬ 
tency, e.g., the encoding of Eigenmike signals (discussed 
later in this paper). Also there is the irreducible latency 
of the audio I/O device. 

3 The number of parallel sends is currently restricted 
to three busses, referred to as A/B/C. In practical mix¬ 
ing situations, it appeared useless to provide more than 
three sends although there is no technical constraint to 
increase this limit. 
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twofold; it allows: 

1) to simultaneously produce a mix in multiple 
formats: tracks can for instance be sent to a 
VBAP bus and to an Ambisonic bus; both 
busses are rendered in parallel, with shared 
settings, and it is fast and easy to switch from 
one to another e.g., for A/B comparison. 

2) to “hybridize” spatialization techniques: for 
instance, when producing binaural mixes, it is 
sometimes useful to combine “true” binaural 
synthesis (or recordings) with conventional 
stereophony. Adjusting the level of the two 
parallel busses, the sound engineer can balance 
between the 3D layer (with well-known binaural 
artifacts such as timbral coloration, front-back 
confusions, in-head localization, etc.) and the 
stereo layer (often considered as more robust 
and spectrally transparent). Such hybridization 
appeared especially useful and convincing when 
producing content intended for non-individual 
HRTF listening conditions. 

Figures |T] and [2] present the signal-flow 
graph of the tracks and busses respectively. 
They also exhibit how the signal processing 
blocks relate to the controllers exposed in the 
user interface (see also Figure [4] for a general 
view of this interface). 

When parallel bussing is involved, some 
elements of the depicted audio graph are 
replicated and run concurrently. 



2‘ 

m, 


Figure 1: Anatomy of a track: a track is essentially 
used for pre-processing the incoming audio source 
(compression, equalization, delay, etc.) and for gen¬ 
erating a set of early reflections that will later 1) 
feed the late reverb FDN and 2) be spatialized. 


The overall processing architecture is inspired 


from the Spat design Jot and Warusfel, 1995 


Jot, 1999, Carpentier et al., 2015 which tightly 


combines an artificial reverberation engine with 
a panning module. This framework relies on a 
simplified space-time-frequency model of room 
acoustics wherein the generated room effect 
is divided in four temporal segments (direct 
sound, early reflections, late reflections, and re¬ 
verb tail); each segment is individually filtered 
and then spatialized (direct sound and early re¬ 
flections are localized as point sources while the 
late segments are spatially diffuse). 

In the first release of panoramix, only the filter¬ 
ing of direct sound was proposed. In the pre¬ 
sented version, we have introduced additional 
filters for the early and late reflection sections, 
therefore extending the range of possible effects. 
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Figure 2: Anatomy of a bus: the purpose of a bus is 
twofold: it generates a late/diffuse reverberation tail 
(shared amongst multiple tracks for efficiency) and 
it provides control over the spatialization rendering. 
The lefthand side (violet frame) depicts the panning 
bus; the righthand side (red frame) represents the 
late reverb bus. 


Note that the number of tracks, busses and 
channel per strip is unlimited, only restricted 
by the available computing power. 


3 Main features 

This section presents the main functionalities of 
the software, with an emphasis on newly added 
features. The interested reader may also refer 
to | Carpentier, 2016 . 
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3.1 2D panpot 

The first version of panoramix was focusing ex¬ 
clusively on 3D rendering approaches, namely 
VBAP [Pulkki, 19 97], Higher Order Ambison- 
ics (HOA) |Daniel, 2001], and binaural 
1992]. It rapidly appeared convenient to 
tegrate 2D techniques, as it is common practice 
to add horizontal-only layers even when mix¬ 
ing for 3D formats. A number of traditional 
2D techniques have therefore been implemented 
(time and/or intensity panning laws such as 2D- 
VBAP or VBIP |Pernaux et ah, 1998], etc). 
The workstation now offers a broad range of al¬ 
gorithms, being able to address arbitrary loud¬ 
speaker layouts. 


Mpller, 
also in- 


3.2 Ambisonic processing 

Higher Order Ambisonic (HOA) is a recording 
and reproduction technique that can be used 
to create spatial audio for circular or spheri¬ 
cal loudspeaker arrangements. It has been sup¬ 
ported in the workstation since its origin, and 
further improvements have been made, espe¬ 
cially in the encoding and transformation mod¬ 
ules. 


3.2.1 HOA encoding 

Compact spherical microphone arrays such as 
the Eigenmik(j/] are sometimes used for music 
recordings as they are able to capture natural 
sound fields with high spatial resolution. The 
signals captured by such pickup systems do not 
directly correspond to HOA components; an en¬ 
coding stage is required. Such encoding usually 
necessitates to regularize the modal radial fil¬ 
ters as they are ill-conditioned for certain fre¬ 
quencies. Various equalization approaches have 
been proposed in the lit erature, in particular: 
Tikhonov regularization |Moreau, 2006 Daniel 


and Moreau, 2004 , soft-limiting Bernschiitz et 
ah, 2011|, filter bank applied in the modal do¬ 


main 


|Baumgartner et ah, 2011 . There is yet no 
consensus about which method is the most ap¬ 
propriate; consequently they have all been im¬ 
plemented in panoramix. An adjustable maxi¬ 
mum amplification factor is also controllable by 
the user. 

Besides HOA recordings, it is also possible to 
synthesize Ambisonic virtual sources and there 
is no restriction on the maximum encoding or¬ 
der. 

Note finally that panoramix supports all usual 
HOA normalization (N3D, N2D, SN3D, SN2D, 


4 http://www.mhacoustics.com 


FuMa, MaxN) and sorting (ACN, SID, Furse- 
Malham) schemes. 

3.2.2 HOA manipulations 

One benefit of the Ambisonic formalism is that 
a HOA stream can be flexibly manipulated so as 
to alter the spatial properties of the sound field. 
In addition to 3D rotations of the sound field 
[Daniel, 2001 Daniel, 2009 , two new transfor¬ 
mation operators have been recently integrated 
to the workstation: 

1) a directional loudness processor |Kronlachner 
and Zotter, 2014 which allows to spatially em¬ 


phasize certain regions of the sound field (Fig¬ 
ure [3j and 

2) a spatial blur effect |Carpentier, 2017] which 
reduces the resolution of an Ambisonic stream, 
indeed simulating fractional order representa¬ 
tion and varying the “bluriness” of the spatial 
image. 

These transformation operators are achieved by 
applying a (time and frequency independent) 
transformation matrix in the Ambisonic do¬ 
main. The implementation is therefore very ef¬ 
ficient, making them suitable for realtime au¬ 
tomation. 



Figure 3: HOA localization interface: the simple 
user interface allows to steer one or multiple virtual 
beams in space; the radial axis is used to control the 
“selectivity” of the virtual beam (from omnidirec¬ 
tional to highly directional). This is especially use¬ 
ful in post-production contexts, either to emphasize 
the sound from certain directions (e.g., instruments) 
or to attenuate undesired regions. 


3.2.3 HOA decoding 

A HOA bus serves as a decoder (with respect 
to a given loudspeaker layout) and it comes 
with a comprehensive set of decoding flavors in¬ 


cluding: sampling Ambisonic decoder Daniel, 
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2001 , mode-matching [Daniel, 2001 , energy- 


preserving |Zotter et al., 2012 , and all-round 


decoding |Zotter and Frank, 20121. In addition 


dual-band decoding is possible, with adjustable 
crossover frequency, and in-phase or max-re op¬ 
timizations can be applied in each band Daniel, 


2001 


3.3 Binaural rendering 

Panoramix implements binaural synthesis for 
3D rendering over headphones. It is possi¬ 
ble t o load HRTF in the SOFA/AES-69 for¬ 
mat Majdak et al., 2013]. Two SOFA con¬ 
ventions are currently supported: “Simple- 
FreeFieldHRIR” for convolution with HRIR, 
and “SimpleFreeFieldSOS” for filtering with 
HRTF represented as second-order sections and 
interaural time delayj^] 

SOFA data can be either loaded from a local 
file or remotely accessed through the OpenDAP 
protocol [Carpentier, 2015a; Carpentier et al. 


2014a . The binaural bus features a user in¬ 


terface for rapid navigation/search through the 
available SOFA Hies (Figure [5]). 



-i0 


http://bili 2 .irca 

http://blli2.lrc 

http ."b: .2 it: 


i.fr/SimpleFreeFieldSOS/BILI/COMPENSATED/44100/IRC_ 
i.fr/SimpleFreeFieldSOS/BILI/COVPENSATED/44100/IRC_ 
i trSicrplor reeFioK1SQS/B<L l/COMPENSATC:V44 tQQtRC. 


.1100_C_SOS_ 
,1100_C_SOS_ 
.1101 C SOS 


.12orflersofa 
24ordarsofa 

.12order.sota 


M»>BZKaii|ljtMMt»Fl»)'fl«BSOSBIL | /CCAtPENSATE3|fflBgjWPS> l l«MECWUfti|^A/lJl>lM» 


im.fr/SimpleFiBeFielc1SOS/BILI/COMPENSATED/44100/IRC_ 

im.tr/SirTipieFrBeFietdSOS/BILI/COMPENSATED/44100/IRC_ 

im.tr/SirriDieFreeFieldSOS/BILI/COMPENSATED/44100/IRC 


3_1102_C_SOS_ 
-_1102_C_SOS_ 
1103 C SOS 
:_iio3_c_sos_ 
"_1104_C_SOS_ 
~_1104_C_SOS_ 
.1105_C_SOS_ 


.12order.sofa 
24ordersofe 
12orflersota 
24otder.sofa 
.12order.sofa 
24order.sofa 
12ordersota 
24orderaota 
12order.sofa 


© 


Figure 5: UI for loading or downloading SOFA 
files. © Filters for quick search. (D Text search field. 
® Results matching query. 


3.5 OSC communication 


All parameters of the panoramix application 
can be remotely accessed via the Open Sound 
Control (OSC) protocol (Wright, 20051. This 
fosters easy and efficient communication with 
other applications (e.g., Pd) or external devices 
(e.g., head-tracker for realtime binaural render¬ 
ing). 

OSC communication may also be used for 
remote automation with a digital audio 
workstation (DAW) through the ToscA plu¬ 
gin [Carpentier, 2015b|. Note, however, that 


the latter has not yet been ported to the Linux 
platform. 


A dedicated window allows to monitor the 
current OSC state of the panoramix engine (see 
® in Figure [4]). Also, the mixing session itself 
is stored to disk as a “stringified” OSC bundle 
(human readable and editable). 

3.6 Enhanced productivity 

A number of other features have been added 
for enhanced productivity, compared to pre¬ 
vious versions. This includes: a large set of 
keyboard shortcuts (the key mapping can fur¬ 
ther be customized and stored - see ® in Fig¬ 
ure [4]) for handling most common tasks (create 
new tracks, enable/disable groups, etc.), tooltip 
pop-up that present inline help tips, the possi¬ 
bility to split the console window in multiple 
windows (especially useful when using multi¬ 
ple screens and dealing with a high number of 
tracks), etc. 

4 Software aspects and Linux port 


3.4 Reverberation 

As mentioned in previous sections, panoramix 
embeds a reverberation engine that allows to 
generate artificial room effects during the mix¬ 
ing process. The reverb processor currently 
used is a feedback de lay n etwor k (FDN) origi¬ 
nally designed by [Jot and Chaigne, 1991 . This 
FDN is particularly flexible and scalable; in typ¬ 
ical use-cases, it involves eight feedback chan¬ 
nels and provides decay control in three fre¬ 
quency bands. 

In addition to that, there is an on-going work 
to further integrate convolution-based or hybrid 
reverberators [Carpe ntier et al., 2014b in the 
bus architecture. 


Panoramix was originally developed as a set 
of two Max/MSF^] externals (panoramix~ for 
the DSP rendering and panoramix for the 
GUI controller) and released in the form of a 
Max standalone application for macOS and 
Windows. 

The DSP code is written is C++. It is OS- 
independent, host-independent (i.e. it does 
not rely on Max/MSP) and highly optimized, 
extensively using vectorized SIMD instructions 
and high performance functions from the 
Intel® Integrated Performance Primitives^] 
The application can easily handle dozens or 
even hundreds of tracks on a modern computer. 
The GUI component, also written in C++, is 


5 see 

www.sofaconventions.org 

for further details on 

http://www.cycling74.com 

SOFA conventions. 

7 

http://software.intel.com 
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Figure 4: panoramix running in Ubuntu with Jack server (QJackctl) as audio pilot. © Input tracks in the 
mixing console. @ Spatialization and reverberation busses. ® Geometrical representation of the sound scene. 
© Parametric equalizer. © Group management. © Jack server and inter-application connections. © Status 
window: allows to inspect the current state of the engine and all parameters exposed to OSC messaging. 
® Shortcut window: allows to edit the key mappings. ® OSC setup window: configure input and output 
port for remote communication. 


built with the Juce@ framework which facili¬ 
tates cross-platform development and provides 
a large set of useful widgets. 


For the Linux environment, it was first 
envisioned to port the Max externals to Pure 
Data (Pd) [Puckette, 1997 . Porting the DSP 
engine is straightforward as the Max and Pd 
APIs are relatively similar in this regard. Port¬ 
ing the GUI object, however, was problematic: 
Pd uses Tcl/Tk as its windowing system, and 
to the best of the author’s knowledge there 
is no easy way to embed GUI components 
developed with other frameworks (such as Juce 
or Qt) in the Tk engine. As an alternative, it 
was decided to create an autonomous appli¬ 
cation, handling both the GUI and the audio 
engine (i.e. an “AudioAppComponent” in 
Juce’s dialect). The application thus operates 
independently of any host engine (Pd or Max) 
and it processes the audio directly to/from the 
audio devices. Furthermore, it is compatible 


with the Jack Audio Connection Kitr which 


mps 

Jjv 


s http://j uce.com/ 

“http://www.jackaudio.org 


makes it pluggable with potentially any audio 
application. In typical use-cases, a digital 
audio workstation such as ArdouJ^j is used to 
send audio streams to the panoramix processor. 
The processed buffers may be re-routed to the 
DAW, e.g., for bouncing, or directly played 
back through the output device (see Figure [6]). 



Figure 6: Typical workflow. 


11 http: //www. ardour . org 
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5 Conclusions 

This paper discussed the Linux port of an au¬ 
dio engine designed for the diffusion, mixing, 
and post-production of spatial audio. We high¬ 
lighted several new features that extend the pos¬ 
sibilities of the tool and improve productivity 
and user experience. Future work will mainly 
focus on the integration of convolution-based 
reverberation into the framework herein pre¬ 
sented. 
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Abstract 

For a sound synthesis programming class in C/C-l—b, 
a Raspberry Pi 3 was used as runtime and develop¬ 
ment system. The embedded system was equipped 
with an Arch Linux ARM, a collection of libraries 
for sound processing and interfacing, as well as with 
basic examples. All material used in and created 
for the class is freely available in public git reposito¬ 
ries. After a unit of theory on sound synthesis and 
Linux system basics, students worked on projects 
in groups. This paper is a progress report, point¬ 
ing out benefits and drawbacks after the first run of 
the seminar. The concept delivered a system with 
acceptable capabilities and latencies at a low price. 
Usability and robustness, however, need to be im¬ 
proved in future attempts. 

Keywords 

Jack, Raspberry Pi, C/C++ Programming, Educa¬ 
tion, Sound Synthesis, MIDI, OSC, Arch Linux 

1 Introduction 

The goal of the seminar outlined in this pa¬ 
per was to enable students with different back¬ 
grounds and programming skills the develop¬ 
ment of standalone real-time sound synthesis 
projects. Such a class on the programming 
of sound synthesis algorithms is, among other 
things, defined by the desired level of depth in 
signal processing. It may convey an application- 
oriented overview or a closer look at algorithms 
on a sample-wise signal processing level, as in 
this case. Based on this decision, the choice 
of tools, respectively the programming environ¬ 
ment should be made. 

Script languages like Matlab or Python are 
widely used among students and offer a comfort¬ 
able environment, especially for students with¬ 
out a background in computer science. They 
are well suited for teaching fundamentals and 
theoretical aspects of signal processing due to 
advanced possibilities of debugging and visual¬ 
isation. Although real-time capabilities can be 


added, they are not considered for exploring ap¬ 
plied sound synthesis algorithms in this class. 

In a more application-based context, graphi¬ 
cal programming environments like Pure Data 
(Pd), MAX MSP or others would be the first 
choice. They allow rapid progress and interme¬ 
diate results. However, they are not the best 
platform to enhance the knowledge of sound 
synthesis algorithms on a sample-wise level by 
nature. 

C/C++ delivers a reasonable compromise be¬ 
tween low level access and the comfort of using 
available libraries for easy interfacing with hard¬ 
ware. For using C/C++ in the development 
of sound synthesis software, a software devel¬ 
opment kit (SDK) or application programming 
interface (API) is needed in order to offer access 
to the audio hardware. 

Digital audio workstations (DAW) use plug¬ 
ins programmed with SDKs and APIs like Stein¬ 
berg’s VST, Apple’s Audio Units, Digidesign’s 
RTAS, the Linux Audio Developer’s Simple 
Plugin API (LADSPA) and its successor LV2, 
which offer quick access to developing audio 
synthesis and processing units. A plugin-host 
is necessary to run the resulting programs and 
the structure is predefined by the chosen plat¬ 
form. The JUCE framework [34] offers the 
possibility of developing cross platform applica¬ 
tions with many builtin features. Build targets 
can be audio-plugins for different systems and 
standalone applications, including Jack clients, 
which would present an alternative to the cho¬ 
sen approach. 

Another possibility is the programming of 
Pd externals in the C programming language 
with the advantage of quick merging of the self- 
programmed components with existing Pd in¬ 
ternals. FAUST [6] also provides means for cre¬ 
ating various types of audio plugins and stan¬ 
dalone applications. Due to a lack in functional 
programming background it was not chosen. 
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For various reasons we settled for the Jack 
API [18] to develop command line programs on 
a Linux system. 

• Jack clients are used in the research on bin¬ 
aural synthesis and sound field synthesis, 
for example by the WONDER interface [15] 
or the SoundScape Renderer [14]. Results 
of the projects might thus be potentially 
integrated into existing contexts. 

• Jack clients offer quick connection to other 
clients, making them as modular as audio¬ 
plugins in a DAW. 

• The Jack API is easy to use, even for be¬ 
ginners. Once the main processing func¬ 
tion is understood, students can immedi¬ 
ately start inserting their own code. 

• The omission of a graphical user interface 
for the application leaves more space for 
focusing on the audio-related problems. 

• The proposed environment is independent 
of proprietary components. 

A main objective of the seminar was to equip 
the students with completely identical systems 
for development. This avoids troubles in han¬ 
dling different operating systems and hardware 
configurations. Since no suitable computer pool 
was available, we aimed at providing a set of 
machines as cheap as possible for developing, 
compiling and running the applications. The 
students were thus provided with a Raspberry 
Pi 3 in groups of two. Besides being one of the 
cheapest development systems, it offers the ad¬ 
vantage of resulting in a highly portable, quasi 
embedded synthesizer for the actual use in live 
applications. 

The remainder of this paper is organized as 
follows: Section 2 introduces the used hard- and 
software, as well as the infrastructure. Section 3 
presents the concept of the seminar. Section 4 
briefly summarizes the experiences and evalu¬ 
ates the concept. 

2 Technical Outline 
2.1 Hardware 

A Raspberry Pi 3 was used as development 
and runtime system. The most recent version 
at that time was equipped with 1.2GHz 64-bit 
quad-core ARMv8 CPU, 802.1 In Wireless LAN, 
a Bluetooth adapter, 1GB RAM, 4 USB ports 
40 GPIO pins and various other features [11]. 


Unfortunately the on-board audio interface 
of the Raspberry Pi could not be configured 
for real-time audio applications. It does not 
feature an input and the Jack server could 
only be started with high latencies. After try¬ 
ing several interfaces, a Renkforce USB-Audio- 
Adapter was chosen, since it delivered an ac¬ 
ceptable performance at a price of 10 €. The 
Jack server did perform better with other inter¬ 
faces, yet at a higher price and with a larger 
housing. 

Students were equipped with MIDI interfaces 
from the stock of the research group and private 
devices. The complete cost for one system, in¬ 
cluding the Raspberry Pi 3 with housing, SD 
card, power adapter and the audio interface, 
were thus kept at about 70 €(vs. the integrated 
low-latency platform bela[5], that still ranks at 
around 120 €per unit). 

2.2 Operating System 

First tests on the embedded system involved 
Raspbian [12], as it is highly integrated and 
has a graphical environment preinstalled, that 
eases the use for beginners. Integration with 
the libraries used for the course proved to be 
more complicated however, as they needed to 
be added to the software repository for the ex¬ 
amples. 

A more holistic approach, integrating the op¬ 
erating system, was aimed at, in order to leave 
as few dependencies on the students’ side as pos¬ 
sible and not having to deal with the hosting of 
a custom repository of packages for Raspbian. 
Due to previous experience with low latency se¬ 
tups using Arch Linux [7], Arch Linux ARM 
[2] was chosen. With its package manager pac- 
man [9] and the Arch Build System [1] an easy 
system-wide integration of all used libraries and 
software was achieved by providing them as pre¬ 
installed packages (with them either being avail¬ 
able in the standard repositories, or the Arch 
User Repository[3]). Using Arch Linux ARM, 
it was also possible to guarantee a systemd [13] 
based startup of the needed components, which 
is further described in Section 2.5. At the time 
of preparation for the course, the 64bit vari¬ 
ant (AArch64) - using the mainline kernel - was 
not available yet. At the time of writing it is 
still considered experimental, as some vendor 
libraries are not yet available for it. Instead the 
ARMv7 variant of the installation image was 
used, which is the default for Raspberry Pi 2. 
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2.3 Libraries 

All libraries installed on the system are listed 
in Tab. 1. For communicating with the au¬ 
dio hardware, the Jack API was installed. The 
jackcpp framework adds a C++ interface to 
Jack. Additional libraries allowed the handling 
of audio file formats, ALSA-MIDI interfacing, 
Open Sound Control, configuration files and 
Fast Fourier Transforms. 


Table 1: Libraries installed on the development 
system 


Library 

Ref. 

Purpose 

jack2 

[18] 

Jack audio API 

jackcpp 

[26] 

C++ wrapper for jack2 

sndhle 

[19] 

Read and write audio files 

rtrnidi 

[32] 

Connect to MIDI devices 

liblo 

[23] 

OSC support 

yarnl 

[20] 

Configuration Hies 

fftw3 

[22] 

Fourier transform 

boost 

[4] 

Various applications 


2.4 Image 

The image for the system is available for down¬ 
load 1 and can be asked for by mailing to the 
authors in case of future unavailability. Installa¬ 
tion of the image follows the standard procedure 
of an Arch Linux ARM installation for Rasp¬ 
berry Pi 3, using the blockwise copy tool dd, 
which is documented in the course’s git reposi¬ 
tory [31]. 

2.5 System Settings 

The most important goal was to achieve a 
round-trip latency below 10 ms. This would not 
be sufficient for real-time audio applications in 
general, but for teaching purposes. With the 
hardware described in Section 2.1, stable Jack 
server command line options were evaluated, 
leading to a round-trip latency of 2.9 ms: 

/usr/bin/jackd -R \K 
-p 512 \ 

-d alsa \ 

-d hw:Device \ 

-n 2 \ 

-p 64 \ 

-r 44100 


1 https://www2.ak.tu-berlin.de/~drunge/ 
klangsynthese 


As these settings were not realizable with the 
internal audio card, the snd-bcm2835 module - 
the driver in use for it - was blacklisted using 
/etc/modprobe.d/*, to not use the sound device 
at all. 

For automatic start of the low-latency au¬ 
dio server and its clients, systemd user ser¬ 
vices were introduced, that follow a user session 
based setup. The session of the main system 
user is started automatically as per systemd’s 
user@.service. This is achieved by enabling the 
linger status of said user with the help of loginctl 
[8], which activates its session during boot and 
starts its enabled services. 

A specialized systemd user service [30] allows 
for Jack’s startup with an elevated CPU sched¬ 
uler without the use of dbus [21], The stu¬ 
dents’ projects could be automatically started 
as user services, that rely on the audio server 
being started by enabling services such as this 
example: 

[Unit] 

Description=Example project 
After=jack@rpi-usb-44100. service 
[Service] 

ExecStart=/path/to/executable \ 
parameterl \ 
paramter2 

Restart=on-failure 
[Install] 

WantedBy=default.target 

2.6 Infrastructure 

For allowing all students in the class the access 
via SSH, a WIFI was set up, providing fixed 
IP addresses for all Raspberry Pis. Network 
performance showed to be insufficient to handle 
all participants simultaneously, though. Thus, 
additional parallel networks were installed. 

For home use, students were instructed to 
provide a local network with their laptops, or 
using their home network over WiFi or cable. 
Depending on their operating system, this was 
more or less complicated. 

3 The Seminar 

The seminar was addressed to graduate stu¬ 
dents with different backgrounds, such as com¬ 
puter science, electrical engineering, acoustics 
and others. One teacher and one teaching assis¬ 
tant were involved in the planning, preparation 
and execution of the classes. 

The course was divided into a theoretical and 
a practical part. In the beginning, theory and 
basics where taught in mixed sessions. The fi- 
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nal third of the semester was dedicated to su¬ 
pervised project work. 

3.1 System Introduction 

Students were introduced to the system by giv¬ 
ing an overview over the tools needed and pre¬ 
senting the libraries with their interfaces. Users 
of Linux, Mac and Windows operating systems 
took part in the class. Since many students 
lacked basic Linux skills, first steps included us¬ 
ing Secure Shell (SSH) to access the devices, 
which proved to be hard for attendees without 
any knowledge on the use of the command-line 
interface. Windows users were aided to install 
and use PuTTY [10] for the purpose of connect¬ 
ing, as there is no native SSH client. After two 
sessions, each group was able to reach the Rasp¬ 
berry Pi in class, as well as at home. 

3.2 Sound Synthesis Theory 

Sound synthesis approaches were introduced 
from an algorithmic point of view, showing ex¬ 
amples of commercially available implementa¬ 
tions, also regarding their impact on music pro¬ 
duction and popular culture. Students were 
provided with ready-to-run examples from the 
course repository for some synthesis approaches, 
as well as with tasks for extending these. 

Important fundamental literature was pro¬ 
vided by Zolzer [35], Pirkle [27] and Roads [29]. 
The taxonomy of synthesis algorithms proposed 
by Smith [33] was used to structure the outline 
of the class, as follows. 

A section on Processed Recording dealt with 
sampling and sample-based approaches, like 
wave-table synthesis, granular synthesis, vector 
synthesis and concatenative synthesis. 

Subtractive synthesis and analog modeling 
were treated as the combination of the basic 
units oscillators, filters and envelopes. Fil¬ 
ters were studied more closely, considering HR 
and FIR filters and design methods like bilin¬ 
ear transform. A ready to run biquad example 
was included and a first order low-pass was pro¬ 
grammed in class. 

Additive Synthesis and Spectral Modeling 
were introduced by an analysis-resynthesis pro¬ 
cess of a violin tone in the SMS model [25], con¬ 
sidering only the harmonic part. 

Physical Modeling was treated for plucked 
strings, starting with the Karplus-Strong algo¬ 
rithm [17], advancing to bidirectional [24]. 

FM Synthesis [16] was treated as a repre¬ 
sentative of abstract algorithms in the class. 
The concept was mainly taught by a closer look 


at the architecture and programming of the 
Yamaha DX7 synthesizer. 

3.3 Projects 

Out of the 35 students who appeared to the first 
meetings, 18 worked on projects throughout the 
whole semester. It should be noted that (with 
one exception) only Linux and MAC users con¬ 
tinued. 

No restrictions were made regarding the 
choice of the topic, exept that it should result 
in an executable program on the Raspberry PI. 
The student projects included: 

• A vector synthesis engine, allowing the 
mixture of different waveforms with a suc¬ 
ceeding filter section 

• A subtractive modeling synth with modu¬ 
lar capabilities 

• A physical string model, based on the 
Karplus-Strong Extended with dispersion 
filter 

• A sine-wave Speech Synthesis [28] effect, 
which includes a real-time FFT 

• A guitar-controlled subtractive synthesizer, 
using zero-crossing rate for pitch detection 

• A wave-digital-filter implementation with 
sensor input from the GPIOs 

In order to provide a more suitable platform 
for running embedded audio applications, one 
group used buildroot 2 to create a custom oper¬ 
ating system. 

4 Conclusions 

The use of the Raspberry Pi 3 for the program¬ 
ming of Jack audio applications showed to be a 
promising approach. A system with acceptable 
capabilities and latencies could be provided at 
a low price. Accessibility and stability, how¬ 
ever, need to be improved in future versions: A 
considerable amount of time was spent work¬ 
ing on these issues in class and the progress in 
the projects was therefore delayed considerably. 
The overhead in handling Linux showed to be a 
major problem for some students and probably 
caused some people to drop out. A possible step 
would be to provide a set with monitor, key¬ 
board and mouse, as this would increase acces¬ 
sibility. The stability of the Jack server needs to 
be worked on, as sometimes the hardware would 

2 https://buildroot.org 
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not be started properly, leading to crashing Jack 
clients. 

In future seminars, which are likely to be con¬ 
ducted after these principally positive experi¬ 
ences, most of the issues should be worked out 
and the image, as well as the repository will 
have been improved. 
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Abstract 

Hearing aids help hearing impaired users partici¬ 
pate in the communication society. Development 
and improvement of hearing aid signal processing 
algorithms takes place in the industry and in aca¬ 
demic research. With openMHA, we present a de¬ 
velopment and evaluation platform that is able to 
execute hearing aid signal processing in real-time 
on standard computing hardware with a low delay 
between sound input and output. We lay out the 
application specific requirements and present how 
openMHA meets these and will be helpful in future 
research in the field of signal processing for hearing 
aids. 

Keywords 

Hearing aids, audio signal processing, plugin host 

1 Introduction 

Development of hearing aid signal processing is 
widely conducted by hearing aid manufacturers 
on proprietary systems that are not accessible 
to the research community and that underlie 
commercial constraints. Providing open tools 
to the hearing aid research community lowers 
barriers, accelerates studies with novel acoustic 
processing algorithms and facilitates translation 
of these advances into widespread use with hear¬ 
ing aids, cochlear implants, and consumer elec¬ 
tronics devices for sub-clinical hearing support. 
A software platform for the development and 
evaluation of hearing aid algorithms should 

• offer a complete set of hearing aid signal 
processing reference algorithms that can be 
combined with newly developed algorithms 
to form a complete hearing aid signal pro¬ 
cessing chain, 

• enable researchers to perform offline- 
processing as well as real-time signal pro¬ 
cessing with a reliable low delay between 
acoustic input and output of less than 10 
milliseconds, even when algorithms need 
significant processing power, 


• provide a library for common signal pro¬ 
cessing tasks and commonly needed ser¬ 
vices in hearing aid signal processing, like 
support for acoustic calibration and filter- 
banks, 

• be able to run on a wide range of hardware, 
from high-performance PCs to execute 
bleeding-edge algorithms in real-time, to 
portable, power-efficient, headless, battery- 
powered devices for improved testing capa¬ 
bilities in realistic usage scenarios and field 
tests. 

Several open-source tools for audio signal pro¬ 
cessing, that can also be used in hearing aid 
research, exist: 

Octave. Octave is actively used in hearing aid 
research for the development of signal process¬ 
ing algorithms for hearing aids. It is a suitable 
tool to quickly develop, change and evolve iso¬ 
lated algorithms as long as no real-time audio 
processing is required. However, Octave is un¬ 
suitable for executing hearing aid algorithms in 
real-time with live input and output sound sig¬ 
nals with low delay, baton et al., 2015 

NumPy/SciPy. Scientific Computing Tools 
for Python enable researchers to develop signal 
processing algorithms. Technically, this soft¬ 
ware platform is equivalent to octave, but it is 
to our knowledge currently not actively used in 
hearing aid research.| Jones et ah, 2001 

Pure Data. Pd is a real-time signal process¬ 
ing platform. It features a graphical program¬ 
ming interface. Pd is actively used mainly by 
artists to perform signal processing of music and 
other data. Pd can achieve a low delay in real¬ 
time processing. In principle it would be possi¬ 
ble to develop hearing aid signal processing al¬ 
gorithms on Pd, and have these algorithms pro¬ 
cess audio signal in real-time. We are not aware 
of any hearing aid research being performed on 
the Pd platform and would consider it too la- 
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borious to implement modern hearing aid algo¬ 
rithms in the graphical programming environ¬ 
ment. Pd can be extended with C, therefore, 
hearing aid algorithms could be implemented 


for Pd in C or C++. Puckette, 1996 


Plugin hosts. Various plugin hosts for differ¬ 
ent plugin architectures (VST, LADSPA, LV2) 
exist, that can load and combine algorithms 
in plugins to form complex signal processing 
chains. Most hosts can achieve a low delay in 
real-time audio processing. Plugins can be writ¬ 
ten in C or C++ using the plugin-architecture 
specific SDK. (Using the VST SDK requires 
signing a license agreement.) Plugin hosts are 
mainly used by sound engineers and also by 
artists to process recorded or live music and 
other sounds. 


Signal processing toolboxes and lan¬ 
guages. A signal processing toolbox like the 
Synthesis ToolKit (STK) |Cook and Scavone 


1999 and domain- spec ific languages (DSL) like 
SuperCollider McCartney, 2002] and Faust [Or 


larey et al., 2009| provide useful signal process¬ 


ing primitives to ease development of audio sig¬ 
nal processing algorithms. We are not aware of 
any hearing aid research being performed using 
these toolboxes and DSLs. 


While the dynamic programming languages 
Octave and Python are suitable to develop al¬ 
gorithms and execute them offline, their run¬ 
time environment is not suitable for real-time 
processing when low delay is required at high 
processing loads. Octave and Python do not 
give algorithm implementers the necessary con¬ 
trol to prevent heap memory allocation in the 
signal processing path, which can cause unpre¬ 
dictable interruptions in the real-time process¬ 
ing due to priority inversion situations. Pd and 
plugin hosts are real-time safe themselves and 
allow algorithms to be implemented in C or 
C++. The C and C++ programming languages 
allow developers sufficient control to implement 
algorithms in a real-time safe way. However, 
Pd and plugin hosts do not provide commonly 
needed services to hearing aid signal processing 
developers like calibration or an existing set of 
hearing aid algorithms. 

The HorTech Master Hearing Aid (MHA) 
Grimm et ah, 2006; Grimm et al., 2009a| is 
an existing software platform for hearing aid al¬ 
gorithm development and evaluation that meets 
all the requirements and has been used by the 
hearing aid industry as well as in academic re¬ 



control applications 
(e.g., Octave) 




Figure 1: Structure of the openMHA. The 
openMHA contains a toolbox library “libMHA- 
Toolbox”, a command line host application, 
which acts as an openMHA plugin host and pro¬ 
vides the configuration interface, and openMHA 
plugins. 


search. Until recently, it was only available 
as a closed-source commercial product. To en¬ 
able and facilitate collaborative research efforts 
and comparative studies in the research commu¬ 
nity, an open-source version of the MHA soft¬ 
ware platform for real-time audio signal process¬ 
ing is now being developed and made available: 
the open Master Hearing Aid (openMHA). In 
February 2017, a pre-release of the openMHA 
has been published on GitHub under an open- 


source license (AGPL3) by HorTech gGmbH 
and Universitat Oldenburg, 2017|. This pre¬ 


release features an initial set of reference al¬ 
gorithms for hearing aid processing, which will 
be expanded in subsequent releases. Thereby, 
openMHA provides a growing benchmark for 
the development and investigation of novel al¬ 
gorithms on this platform in the future. With 
the openMHA we provide an open-source tool 
that is tailored to the needs of hearing aid al¬ 
gorithm research which was not available before 
as a specialized tool in the open-source domain. 


2 Structure 

The openMHA can be split into four major com¬ 
ponents (see Figure [I] for an overview): 
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1. The openMHA command line application 

2. Signal processing plugins 

3. Audio input-output (10) plugins 

4. The openMHA toolbox library 

The openMHA command line application acts 
as a plugin host. It can load signal process¬ 
ing plugins as well as audio input-output (10) 
plugins. Additionally, it provides the command 
line configuration interface and a TCP/IP based 
configuration interface. Several 10 plugins ex¬ 
ist: For real-time signal processing, commonly 
the “MHAIOJack” plugin is used, which pro¬ 
vides an interface to the Jack Audio Connec¬ 
tion Kit (JACK) | Davis, 2003 . Other 10 plu¬ 
gins provide audio hie access or TCP/IP-based 
processing. 

openMHA plugins provide the audio signal 
processing capabilities and audio signal han¬ 
dling. Typically, one openMHA plugin imple¬ 
ments one specific algorithm. The complete 
virtual hearing aid signal processing can be 
achieved by a combination of several openMHA 
plugins. 

The openMHA toolbox library “libMHATool- 
box” provides reusable data structures and sig¬ 
nal processing classes. Examples are class tem¬ 
plates for the implementation of openMHA plu¬ 
gins, and container classes for audio data. Fur¬ 
thermore, several filter classes in temporal or 
spectral domain, filter banks, and hearing aid 
specific classes are provided in this library. 

3 openMHA Platform Services and 
Conventions 

The openMHA platform offers some services 
and conventions to algorithms implemented in 
plugins, that make it especially well suited to 
develop hearing aid algorithms, while still sup¬ 
porting general-purpose signal processing. 

3.1 Audio Signal Domains 

As in most other plugin hosts, the audio signal 
in the openMHA is processed in audio chunks. 
However, plugins are not restricted to propa¬ 
gate audio signal as blocks of audio samples in 
the time domain - another option is to propa¬ 
gate the audio signal in the short time Fourier 
transform (STFT) domain, i.e. as spectra of 
blocks of audio signal, so that not every plugin 
has to perform its own STFT analysis and syn¬ 
thesis. Since STFT analysis and re-synthesis of 
acceptable audio quality always introduces an 


algorithmic delay, sharing STFT data is a ne¬ 
cessity for a hearing aid signal processing plat¬ 
form, because the overall delay of the complete 
processing has to be as short as possible. 

Similar to some other platforms, the 
openMHA allows also arbitrary data to be ex¬ 
changed between plugins through a mechanism 
called “algorithm communication variables” or 
short “AC vars”. This mechanism is commonly 
used to share data such as filter coefficients or 
filter states. 

3.2 Real-Time Safe Complex 
Configuration Changes 

Hearing aid algorithms in the openMHA can ex¬ 
port configuration settings that may be changed 
by the user at run time. To ensure real-time safe 
signal processing, the audio processing will nor¬ 
mally be done in a signal processing thread with 
real-time priority, while user interaction with 
configuration parameters would be performed 
in a configuration thread with normal priority, 
so that the audio processing does not get in¬ 
terrupted by configuration tasks. Two types of 
problems may occur when the user is changing 
parameters in such a setup: 

1. The change of a simple parameter exposed 
to the user may cause an involved recalcu¬ 
lation of internal runtime parameters that 
the algorithm actually uses in processing. 
The duration required to perform this re¬ 
calculation may be a significant portion of 
(or take even longer than) the time avail¬ 
able to process one block of audio signal. 
In hearing aid usage, it is not acceptable to 
halt audio processing for the duration that 
the recalculation may require. 

2. If the user needs to change multiple param¬ 
eters to reach a desired configuration state 
of an algorithm from the original configu¬ 
ration state, then it may not be acceptable 
that processing is performed while some of 
the parameters have already been changed 
while others still retain their original val¬ 
ues. It is also not acceptable to interrupt 
signal processing until all pending configu¬ 
ration changes have been performed. 

The openMHA provides a mechanism in its 
toolbox library to enable real-time safe configu¬ 
ration changes in openMHA plugins: Basically, 
existing runtime configurations are used in the 
processing thread until the work of creating an 
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updated runtime configuration has been com¬ 
pleted in the configuration thread. In hear¬ 
ing aids, it is more acceptable to continue to 
use an outdated configuration for a few more 
milliseconds than blocking all processing. The 
openMHA toolbox library provides an easy-to- 
use mechanism to integrate real-time safe run¬ 
time configuration updates into every plugin. 

3.3 Plugins can Themselves Host Other 
Plugins 

An openMHA plugin can itself act as a plugin 
host. This allows to combine analysis and re¬ 
synthesis methods in a single plugin. We call 
plugins that can themselves load other plugins 
“bridge plugins” in the openMHA. When such a 
bridge plugin is then called by the openMHA to 
process one block of signal, it will first perform 
its analysis, then invoke (as a function call) the 
signal processing in the loaded plugin to process 
the block of signal in the analysis domain, wait 
to receive a processed block of signal in the anal¬ 
ysis domain back from the loaded plugin when 
the signal processing function call to that plu¬ 
gin returns, then perform the re-synthesis trans¬ 
form, and finally return the block of processed 
signal in the original domain back to the caller 
of the bridge plugin. 

3.4 Central Calibration 

The purpose of hearing aid signal processing is 
to enhance the sound for hearing impaired lis¬ 
teners. Hearing impairment generally means 
that people suffering from it have increased 
hearing thresholds, i.e. soft sounds that are au¬ 
dible for normal hearing listeners may be imper¬ 
ceptible for hearing impaired listeners. To pro¬ 
vide accurate signal enhancement for hearing 
impaired people, hearing aid signal processing 
algorithms have to be able to determine the ab¬ 
solute physical sound pressure level correspond¬ 
ing to a digital signal given to any openMHA 
plugin for processing. Inside the openMHA, 
we achieve this with the following convention: 
The single-precision floating point time-domain 
sound signal samples, that are processed inside 
the openMHA plugins in blocks of short du¬ 
rations, have the physical pressure unit Pascal 
(IPa = lN/m 2 ). With this convention in place, 
all plugins can determine the absolute physi¬ 
cal sound pressure level from the sound sam¬ 
ples that they process. A derived convention is 
employed in the spectral domain for STFT sig¬ 
nals. Due to the dependency of the calibration 
on the hardware used, it is the responsibility of 


the user of the openMHA to perform calibration 
measurements and adapt the openMHA settings 
to make sure that this calibration convention 
is met. We provide the plugin transducers (cf. 
section 4.1) which can be configured to perform 
the necessary signal adjustments in most situa¬ 
tions. 


4 February 2017 Pre-Release 

In February 2017, HorTech and Universitat 
Oldenburg published a pre-release of the 
openMHA on GitHub under an open-source li¬ 
cense (AGPL3). This pre-release contains the 
openMHA command line application, the tool¬ 
box library “libMHAToolbox”, an initial set 
of openMHA plugins and openMHA sound in¬ 
put/output (10) libraries, and example configu¬ 
rations. The initial set of plugins and sound 10 
libraries was selected so that a basic research 
hearing aid configuration can be realized with 
the contained plugins, and users could process 
both, live sounds via JACK as well as sound 
from and to files. The basic hearing aid algo¬ 
rithms present in the pre-release include 


an 


adaptive differential microphone al¬ 
gorithm that suppresses interfering noise 
from the rear hemisphere (cf. section 4.3), 


a binaural coherence filter that provides 
feedback s uppr ession and dereverberation 
(cf. section 4.5), and 


a multi-band dynamic range compression 
algorithm that restores audibility of sounds 
for the hearing impaired user (cf. section 
4/71). 


Apart from the plugins that implement just 
these algorithms, additional supporting plug¬ 
ins are contained in the pre-release that are re¬ 
quired to form a complete hearing aid imple¬ 
mentation. The contained plugins are briefly 
described in the following subsections. 

For real-time hearing aid processing, an 
input-output delay below 10 ms is required. 
This ensures that 


• the hearing-impaired user is not confused 
by asynchrony between lip movements of 
a conversation partner and the perceived 
sound, 

• no echo-effects are audible if the direct 
sound can also be perceived by the hear¬ 
ing aid user, and 
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• fewer frequencies are available for possibly 
annoying acoustic feedback loops [Grimm 
et ah, 2009a] . 

The example configuration that combines all 
three example algorithms mentioned here shows 
an algorithmic delay of 4.4 ms. On top of this al¬ 
gorithmic delay, input and output of the sound 
through a sound card causes additional delay in 
the range of two to three block durations de¬ 
pending on the hardware in use. The example 
configuration uses a block size of 64 samples at 
44100 Hz sampling rate. We have found, that 
e.g. with the RME Multiface II sound card and 
the snd-hdsp alsa driver used by JACK, this will 
add 4.4 ms delay between acoustic input and 
output on a Linux system with a low-latency 
kernel and real-time priorities set up for JACK 
and the alsa sound driver. 

This results in an overall delay of 8.8 ms of 
the example configuration containing the plu¬ 
gins described in the following in the order of 
their processing. 


4.1 The transducers Plugin 

A device-dependent calibration is required for 
plugins to be able to deduce the physical sig¬ 
nal level that is present at the hearing aid in¬ 
put. When connecting a microphone to a sound 
card and using that sound card to feed sound 
samples to the openMHA, these sound samples 
do not automatically follow the openMHA level 
convention outlined in section 13.41 The same 
is true when using sound files instead of sound 
cards for input and output. Different micro¬ 
phones have different sensitivities. Sound cards 
have adjustable amplification settings. Sound 
files may have been normalized before they have 
been saved to disk. To be able to implement 
the openMHA level convention, i.e., that the 
numeric value of time-domain sound samples in 
the openMHA should reflect their sound pres¬ 
sure amplitude in Pascal, we need to be able 
to adjust for arbitrary physical level to digital 
level mappings in the openMHA. This is done 
with the help of the plugin transducers, which 
is the only plugin that must not rely on this 
convention, because it is the one plugin that 
has to make sure that all other plugins can rely 
on this convention. For this reason, transduc¬ 
ers is usually loaded as the first plugin into 
the openMHA, and will itself (i.e. as a bridge 
plugin, cf. section 3.3) load another openMHA 
plugin into the openMHA process. This other 
plugin receives the calibrated input signal from 


transducers, and it sends its processed but still 
calibrated output signal back to the transducers 
plugin to adjust for the physical outputs, trans¬ 
ducers provides filters and gain adjustments to 
ensure calibration of inputs and outputs. Typi¬ 
cal output calibration values are in the order of 
110 dB SPL of a full-scale signal. 

4.2 The mhachain Plugin 

An mhachain plugin can itself load several other 
plugins in a configurable order, where each plu¬ 
gin processes the output signal of the previous 
plugin. 

4.3 The Adaptive Differential 
Microphone ( adm ) Plugin 

Reduced audibility of soft sounds is not the only 
problem that hearing impaired listeners face 
when communicating. Another commonly ex¬ 
perienced problem is a reduced intelligibility of 
speech in noisy environments, even if the speech 
is loud enough to be perceived. Hearing aids 
therefore regularly employ signal processing al¬ 
gorithms to enhance the signal-to-noise ratio 
of speech in noisy environments. In this con¬ 
text assumptions about target and noise sources 
play an important role as well as robustness 
and generalization capabilities of the method 
used. Adaptive differential microphones (ADM, 
Elko and Pon g, 1995] ) aim at the preserva¬ 
tion of a target signal while suppressing back¬ 
ground noise. For this purpose, two general 
assumptions are made: the target is assumed 
to be present in the frontal hemisphere of a 
listener, while noise occurs in the rear hemi¬ 
sphere. ADMs work for pairs of omnidirec¬ 
tional microphones separated by a small dis¬ 
tance, and combine a two-channel input to a 
single-channel output signal by adding up de¬ 
layed and weighted versions of the input as 
shown in Figure [2j In a binaural setting two 
independent, bilateral ADMs are realized, each 
using a two-microphone pair located in the a 
hearing aid device on one ear. 


4.4 The overlapadd Plugin 

overlapadd is one of the openMHA plugins that 
perform conversion between time domain and 
spectral domain as a service for algorithms that 
process a series of short time Fourier transform 
(STFT) signals. Thereby, not every openMHA 
plugin that processes spectral signal has to per¬ 
form its own spectral analysis. 

overlapadd is a bridge plugin (cf. sec¬ 
tion 3.3) and performs both, the forward and 
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Figure 2: Adaptive differential microphone sig¬ 
nal flowchart. The input of the front and back 
microphone is combined to a single-channel out¬ 
put after applying a delay T and a weighting. 


the backward transform, and can load another 
openMHA plugin which analyses and modi¬ 
fies the signal while in the spectral domain. 
The plugin performs the standard process of 
collecting the input signal, windowing, zero¬ 
padding, fast Fourier transform, inverse fast 
Fourier transform, additional windowing, and 
overlap-add time signal output. It can be used 
in standard overlap-add (OLA) and weighted 
overlap-add (WOLA) contexts. 


4.5 The Binaural coherence Filter 
Plugin 

An important issue in hearing aid processing 
is the reduction of feedback that can occur be¬ 
tween the hearing aid receivers (outputs) and 
the closely located inputs (microphones). At 
high output levels a sound loop can emerge, 
causing annoying, self-sustaining beep tones. 

Binaural coherence filtering, i.e., coherence- 
based gain control is applied to reduce this ef¬ 
fect and enable higher gain levels of the hearing 


device Grimm et al., 2009b . 

Figure |3| shows that the binaural coherence is 
measured between the left and the right input 
signals to the hearing aids and used to derive 
frequency-dependent gains. 

Coherence filtering also contributes to noise 
and reverberation reduction, as diffuse, inco¬ 
herent background sounds are also reduced. 
A combination the binaural coherence filtering 
with preceding bilateral ADMs was shown to 
be beneficial, i.e., increased speech intelligibility 
with a binaural hearing aid setup [Baumgartel 


et ah, 2015 


4.6 The fftfilterbank Plugin 

In the hearing impaired, the hearing loss gener¬ 
ally varies with frequency. To restore audibility 
in hearing impaired listeners with amplification 
and compression in hearing aid signal process¬ 
ing, it is therefore common practice to amplify 



Figure 3: Coherence filter signal flowchart. Bin¬ 
aural coherence-based gain control is applied to 
the left and the right input channel in different 
frequency bands in the STFT domain. 



Figure 4: Dynamic compression signal 

flowchart. The input is split into frequency 
bands by a filter-bank. Before re-synthesis , an 
input-level dependent gain rule is applied. 


and compress the signal differently in different 
frequency bands, and let the time-varying input 
level in the different frequency bands control 
the gain selection. The fftfilterbank plugin re¬ 
ceives broadband spectra for each audio channel 
and divides the incoming spectra into multiple 
narrower frequency bands for processing by the 
following openMHA algorithms. The fftfilter¬ 
bank provides flexibility for filter-bank design. 
The output frequency bands may overlap or not, 
with variable degrees of overlap, with customiz¬ 
able filter shapes and different frequency scales 
to specify the edge or center frequencies of the 
Liters. 

4.7 Hearing Loss Compensation ( dc ) 

The dc plugin applies Multi-band dynamic 
range compression (Grimm et al., 2015 to the 
signal. This operation serves two important 
aspects in a hearing aid: The hearing loss is 
compensated by defining gain rules between in¬ 
put and output level. Specific gain rules are 
also used to compensate recruitment effects that 
often comes along with a hearing loss, i.e., a 
decreased range between the percept of a soft 
sound and the loudest sound with a still com¬ 
fortable level. To compensate for this effect, soft 
input sounds are usually amplified with higher 
gains than loud sounds. The dc plugin allows to 
specify a gain-matrix with different gains for dif¬ 
ferent frequencies and input sound levels. Input 
sound levels in hearing aid frequency bands are 
commonly measured with attack-release level 
Liters, the time constants of which can be freely 
conhgured in the dc plugin. Figure [4] shows the 
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signal flow for dynamic compression with the 
dc plugin. The dc plugin also allows to config¬ 
ure binaural and inter-frequency interactions of 
gain derivation. 

4.8 The combinechannels Plugin 

Because the fftfilterbank splits broadband sig¬ 
nals into frequency bands for processing by the 
dc plugin, these frequency bands have to be re¬ 
combined to broadband channels again, after 
dc has processed them. This is done in the 
combinechannels plugin. Of course, the fftfil¬ 
terbank and combinechannels plugins could be 
com bined into a single bridge plugin (cf. sec¬ 
tion 3.3). This would generally be a better 


implementation choice. It is not done here to 
showcase the flexibility of the openMHA plat¬ 
form: It is also possible to have analysis and re¬ 
synthesis of some transform as separate plugins, 
and to propagate the signal from one plugin to 
the next inside a single mhachain plugin while 
the domain changes from one plugin to the next 
(here: few broadband channels vs many narrow- 
band channels). 

5 Software 


strings) and dimensions (scalars, vectors, ma¬ 
trices) are supported. For more details, please 
refer to | Grimm et ah, 2006 . 

5.2 Plugin Development 

New plugins can be developed for the openMHA 
by implementing a C+-1- class derived from a 
generic base class, implementing the methods 
and compiling it to a shared object. Together 
with other helper classes provided by the MHA- 
Toolbox library, out-of-the box support for ex¬ 
porting variables to the configuration interface 
(cf. section |5.1| ) and for thread safe configura¬ 
tion updates (cf. section 3.2) is available. 


Simple plugins will usually output the signal 
in the same domain (spectrum or waveform) as 
the input domain. It is also possible to im¬ 
plement domain transformations (from the time 
domain to spectrum or vice versa) inside a plu¬ 
gin, as well as change the number of audio chan¬ 
nels, and even the number of audio samples 
per block and the sampling rate (e.g. for re¬ 
sampling) . 

A detailed manual for plugin development 
and implementation will be provided with a 
near-future release. 


openMHA is a command line application with 
no graphical user interface (GUI) of its own. 
openMHA can be configured with command line 
parameters, configuration hies, interactively 
over a network connection, or by a combina¬ 
tion of all three methods. The same text-based 
configuration language is used in all three meth¬ 
ods. Special-purpose GUIs can be produced to 
control the openMHA over the network connec¬ 
tion. Such GUIs can be produced in any pro¬ 
gramming language or framework that is able to 
connect to the openMHA over a TCP network 
connection. Some special-purpose GUIs exist 
for the closed-source MHA that also work with 
the openMHA, but are not yet part of the first 
open-source pre-release. GUIs will be added in 
later releases of the openMHA. 

5.1 Configuration Interface 

The openMHA application itself and also its 
plugins are controlled through a simple, text- 
based configuration language. The language 
allows hierarchical configuration similar to the 
concept of Octave and Matlab structures. The 
configuration language enables variable assign¬ 
ments, queries, and loading and saving of con¬ 
figuration hies. Variables of different types (in¬ 
tegers, boating point and complex numbers, 


6 Conclusions 

The openMHA provides the means for sustain¬ 
able research on and development of hearing aid 
processing algorithms and assistive hearing sys¬ 
tems. The software is further developed in the 
project ’’Open community platform for hearing 
aid algorithm research”, additionally, updates 
based on the feedback of the research commu¬ 
nity will be conducted. Future work will ex¬ 
tend the openMHA in several directions: The 
set of reference algorithms will be expanded and 
experimental algorithms will be included. Ad¬ 
ditional hardware and operation systems will 
be included, i.e., real-time runtime support for 
Beaglebone Black ARM and similar platforms, 
as well as support for Windows operations sys¬ 
tems. Increased usability on different user levels 
is achieved by the preparation of a GUI for the 
pure application of the openMHA, e.g., in the 
context of audiological measurements, availabil¬ 
ity of reference manuals for the conhguration as 
well as the implementation of plugins for real¬ 
ization and implementation of own algorithms 
and methods and their evaluation. 

The openMHA is intended to serve as a plat¬ 
form for extensive research and evaluations by 
the community. A pre-release of the software in 
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its current version including example configura¬ 
tion files as described here can be downloaded 
via http://www.openmha.org. 
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Abstract 

INScore is an environment for the design of aug¬ 
mented interactive music scores opened to conven¬ 
tional and non-conventional use of the music nota¬ 
tion. The system has been presented at LAC 2012 
and has significantly evolved since, with improve¬ 
ments turned to dynamic and animated notation. 
This paper presents the latest features and notably 
the dynamic time model, the events system, the 
scripting language, the symbolic scores composition 
engine, the network and Web extensions, the inter¬ 
action processes representation system and the set 
of sensor objects. 


Keywords 

INScore, music score, dynamic score, interaction. 

1 Introduction 

Contemporary music creation poses numerous 
challenges to the music notation. Spatialized 
music, new instruments, gesture based inter¬ 
actions, real-time and interactive scores, are 
among the new domains that are now com¬ 
monly explored by the artists. Common music 
notation doesn’t cover the needs of these new 
musical forms and numerous research and ap¬ 
proaches have recently emerged, testifying to 
the maturity of the music notation domain, in 
the light of computer tools for music notation 
and representation. Issues like writing spatial¬ 
ized music |Ellbe rger et al., 2015 , addressing 
new instruments Mays and Faber, 2014 or new 
interfaces [Enstrom et ah, 2015j (to cite just a 


few), are now subject of active research and pro¬ 
posals. 

Interactive music and real-time scores are also 
representative of an expanding domain in the 
music creation field. The advent of the digi¬ 
tal score and the maturation of the computer 
tools for music notation and representation con¬ 
stitute the basement for the development of this 
musical form, which is often grounded on non- 
traditional music representation Smith, 2015 


[Hope et ah, 2015 but may also use the com¬ 


mon music notation Hoadley, 2012; Hoadley, 
2014]. 


In order to address the notation challenges 
mentioned above, INScore |Fober et ah, 2010] 
has been designed as an environment opened 
to non-conventional music representation (al¬ 
though it supports symbolic notation), and 
turned to real-time and interactive use Fober et 


ah, 2013 . It is clearly focused on music repre¬ 


sentation only and in this way, differs from tools 
integrated into programming environments like 
Bach [Agost ini and Ghisi, 2012 or MaxScore 
|Didkovsky and Hajdu, 2008 


INScore has been already presented at LAC 
2012 [Fober et ah, 2012a|. It has significantly 
evolved since and this paper introduces the 
set of issues that have been more recently ad¬ 
dressed. After a brief recall of the system and 
of the programming environment, we’ll present 
the scripting language extensions and the sym¬ 
bolic scores composition engine that provides 
high level operations to describe real-time and 
interactive symbolic scores composition. Next 
we’ll describe how interaction processes repre¬ 
sentations can be integrated into the music score 
and how remote access is supported using the 
network and/or Web extensions. Tablet and 
smartphone support have led to integrate ges¬ 
tural interaction with a set of sensor objects 
that will be presented. Finally, the time model, 
recently extended, will be described. 


2 The INScore environment 

INScore is an environment to design interactive 
augmented music scores. It extends the mu¬ 
sic representation to arbitrary graphic objects 
(symbolic notation but also images, text, vecto¬ 
rial graphics, video, signals representation) and 
provides an homogeneous approach to manipu¬ 
late the score components both in the graphic 
and time spaces. 

It supports time synchronization in the 
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graphic space, which refers to the graphic rep¬ 
resentation of the temporal relations between 
components of a score - via a synchronization 
mechanism and using mappings that express re¬ 
lations between time and graphic space segmen¬ 
tations (Fig. [I]). 



J* 

=M 

I 







4 ] 


4^ 







-J- 

F+- 


r 



¥ M 




« /I 




' r.vi 


Figure 1: A graphic score (Mark Applebaum’s 
graphic score Metaphysics of Notation) is 
synchronized to a symbolic score. The picture 
in the middle is the result of the 
synchronization. The vertical lines express the 
graphic to graphic relationship, that have 
been computed by composing the objects 
common relations with the time space. 


INScore has been primarily designed to be 
controlled via OSC0 messages. The format of 
the messages consists in an OSC address fol¬ 
lowed by a message string and 0 to n parameters 
(Fig. [§■ 



Figure 2: INScore messages general format. 


Compared to object oriented programming, 
the address may be viewed as an object pointer, 
the message string as a method name and the 
parameters as the method parameters. For ex¬ 
ample, the message: 

/ITL/scene/score color 255 128 40 150 
may be viewed as the following method call: 

ITL[scene[score]]->color(255 128 40 150) 
The system provides a set of messages for 
the graphic space control ( x , y, color, space, 
etc.), for the time space control ( date, 
duration, etc.), and to manage the environ¬ 
ment. It includes two special messages: 

: http://opensoundcontrol.org/ 


• the set message that operates like a con¬ 
structor and that takes the object type as 
parameter, followed by type specific param¬ 
eters (Fig. [3]). 

• the get message provided to query the sys¬ 
tem state (Fig. [4]). 


/ITL/scene/obj set txt "Hello world!"; 

Figure 3: A message that creates a textual 
object, which type is txt, with a text as 
specific data. 


/ITL/scene/obj get; 

-> /ITL/scene/obj set txt "Hello 
world!"; 

/ITL/scene/obj get x y; 

-> /ITL/scene/obj x 0; 

-> /ITL/scene/obj y 0.5; 

Figure 4: Querying an object with a get message 
gives messages on output (prefixed with ->). 
These messages can be used to restore the 
corresponding object state. 


The address space is dynamic and not limited 
in depth. It is hierarchically organized, the first 
level /ITL is used to address the application, 
the second one /ITL /scene to address the score 
and the next ones to address the components 
of a score (note that scene is a default name 
that can be user defined). Arbitrary hierarchy 
of objects is supported. 

3 The scripting language 

The OSC messages described above have been 
turned into a textual version to constitue the 
INScore scripting language. This language has 
been rapidly extended to support : 

• variables, that may be used to share pa¬ 
rameters between messages (Fig. [ 5 ]). 

• message based variables and/or parameters 
that consists in querying an object to re¬ 
trieve one of it’s attributes value (Fig. [6]). 

• an extended OSC addressing scheme that 
allows to send OSC messages to an exter¬ 
nal application for initialization of control 
purposes (Fig. [ 7 ]). 
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• JavaScript sections that can be evaluated 
at parsing and/or run time. A JavaScript 
call is expected to produce INScore mes¬ 
sages as output (Fig. [8]). 

• mathematical expressions ( + - / *, 

conditionals, etc.) that can be used for 
arguments computation (Fig. [9]). 

• symbolic scores composition expressions 
that are described in section [4J 

greylevel = 140; 

color = $greylevel $greylevel $greylevel; 

/ITL/scene/objl color $color; 

/ITL/scene/obj2 color $color; 

Figure 5: Variables may be used to share values 
between messages. 


ox = $(/ITL/scene/obj get x); 
/ITL/scene/obj2 x $(/ITL/scene/obj get x); 

Figure 6: The output of get messages can be 
used by variables or as another message 
parameter. 


/ITL/scene/obj set txt "Hello world!"; 
localhost:8000/start; 

Figure 7: This script initialises a textual object 
and sends the /start message to an external 
application listening on UDP port 8000. 


4 Symbolic scores composition 

Rendering of symbolic music notation makes use 
of the Guido engine Daudin et ah, 2009 . Thus 
the primary music score description format is 
the Guido Music Notation format |Hoos et al. 


1998 

2001 


[GMN], The MusicXML format |Good. 


is also supported via conversion to the 
GMN format. 

The Guido engine provides a set of operators 
for scores level composition |Fober et ah, 2012b|. 
These operators consistently take 2 scores as ar¬ 
gument to produce a new score as output. They 
allow to put scores in sequence (seq), in par¬ 
allel (par), to cut a score in the time dimen¬ 
sion (head, tail), in the polyphonic dimen¬ 
sion (top, bottom) , to transpose (transpose), 
to stretch (duration) a score and to apply the 


<?javascript 

function randpos(address) { 

var x = (Math.random() * 2) - 1; 
return address + " x " + x + ";"; 

} 

?> 

/ITL/scene/javascript run 

’randpos("/ITL/scene/obj")’; 

Figure 8: The JavaScript section defines a 
randpos function that computes an x 
message with a random value, addressed to 
the object given as parameter. This function 
may be next called at initialization or at any 
time using the static JavaScript node 
embedded into each score. 


/ITL/scene/o x ($shift ? $x + 0.5 : $x); 

Figure 9: A mathematical expression is used to 
compute the position of an object depending 
on 2 previously defined variables. 


rhythm or the pitch of a score to another one 
(rhythm, pitch). 

The INScore scripting language includes score 
expressions, a simple language providing score 
composition operations. The novelty of the pro¬ 
posed approach relies on the dynamic aspects of 
the operations, as well as on the persistence of 
the score expressions. A score may be composed 
as an arbitrary graph of score expressions and 
equipped with a fine control over the changes 
propagation. 


4.1 Score expressions 

A score expression is defined as an operator fol¬ 
lowed by two scores (Fig. 10). The leading expr 
token is present to disambiguate parenthesis in 
the context of INScore scripts. 


score expression: 



Figure 10: Score expressions syntax. 


The score arguments may be: 

• a literal score description string (GMN or 
MusicXML formats), 

• a file (GMN or MusicXML formats), 

• an existing score object, 
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• a score expression. 


An example is presented in Fig. 11 


expr( par score.gmn (seq " [c]" score)) 

Figure 11: A score expression that puts a score 
file ( score.gmn) in parallel with the 
sequence of a literal score and an existing 
object ( score). Note that the leading expr 
token can be omitted inside an expression. 


4.2 Dynamic score expression trees 

The score expressions language is first trans¬ 
formed into an internal tree representation. In a 
second step, this representation is evaluated to 
produce GMN strings as output, that are finally 
passed to the INScore object as specific data. 

Basically, the tree is reduced using a depth 
first post-order traversal and the result is stored 
in a cache. However, the score expressions lan¬ 
guage provides a mechanism to make arbitrary 
parts of a tree variable using an ampersand (&) 
as prefix of an argument, preventing the cor¬ 
responding nodes to be reduced at cache level 
(Fig. pi). 


expr( par score.gmn (seq " [c]" &score)) 

Figure 12: A score expression that includes a 
reference to a score object. Successive 
evaluations of the expression may produce 
different results, provided that the score 
object has changed. 


INScore events system (described in section 


8 .2) provides a way to automatically trigger the 


re-evaluation of an expression when one of it’s 
variable parts has changed. These mechanisms 
open the door to dynamic scores composition 
within the INScore environment. More details 
about the score expressions language can be 


found in Lepetit-Aimon et al., 2016 


5 Musical processes representation 

INScore includes tools for the representation of 
musical processes within the music score. In 
the context of interactive music and/or when a 
computer is involved in a music performance, 
it may provide useful information regarding the 
state of the musical processes running on the 
computer. This feedback can notably be used to 
guide the interaction choices of the performer. 


On INScore side, a process state is viewed as 
a signal. Signals are part of a score components 
and can be combined into graphic signals to be¬ 
come first order objects of a score. They may 
be notably used for the representation of a per¬ 
formance [Fober et al., 2012a . 

Regarding musical processes representation, 
a signal can be connected to any attribute of an 
object (Fig. fl3[ ), which makes the signal varia¬ 
tions visible (and thus the process activity) with 
the changes of the corresponding attributes. 

/ITL/scene/signal/sig size 100; 

/ITL/scene/obj set rect 0.5 0.5; 

/ITL/scene/img set img ’file.png 1 ; 

/ITL/scene/signal connect sig 

"obj:scale" "img:rotatez[0,360] "; 

Figure 13: A signal sig is connected to the 
scale attribute of an object and to the 
rotatez attribute of an image. Note that for 
the latter, the signal values (expected to be 
in [-1,1]) are scaled to the interval [0,360]. 


6 Network and Web dimensions 

INScore supports the aggregation of distributed 
resources over Internet, as well as the publica¬ 
tion of a score via the HTTP and/or WebSocket 
protocols. In addition, a score can also be used 
to control a set of remote scores on the local 
network using a forwarding mechanism. 

6.1 Distributed score components 

Most of the components of a score can be de¬ 
fined in a literal way or using a file. All the file 
based resources can be specified as a simple file 
path, using absolute or relative path, or as an 
HTTP url (Fig[Mj. 

/ITL/scene/obj1 set img ’file.png’; 
/ITL/scene/obj2 set img 

’http://www.adomain.org/file.png’; 

Figure 14: File based resources can refer to local 
or to remote files. 


When using a relative path, an absolute path 
is built using the current path of the score, that 
may be set to an arbitrary location using the 
rootPath attribute of the score (Fig |15|) . 

The current rootpath can also be set to an 
arbitrary HTTP url, so that further use of a 
relative path will result in an url (Fig. 16). 










LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


47 


/ITL/scene rootPath. 1 /users/me/inscore’; 
/ITL/scene/obj set img 'file.png 1 ;; 


• post: intended to send an INScore script to 
the server. 


Figure 15: The rootPath of a score is equivalent 
to the current directory in a shell. With this 
example, the system will look for the file at 
’ / user s/ me/inscore/file.png’ 


/ITL/scene rootPath 

’ http://www.adomain.org’; 
/ITL/scene/obj set img ’file.png’;; 

Figure 16: The rootPath supports urls. With 

this example, the system will look for the file 
at ’http://www.adomain.org/file.png’ 


This mechanism allows to mix local and re¬ 
mote resources in the same music score, but also 
to express local and remote scores in a similar 
way, just using a rootPath change. 

6.2 HTTPd and WebSocket objects 

A music score can be published on the Inter¬ 
net using the HTTP or the WebSocket proto¬ 
cols. Specific objects can be embedded in a 
score in order to make this score available to 
remote clients (Fig. 0 - 

/ITL/scene/http set httpd 8000; 
/ITL/scene/ws set websocket 8100 200; 

Figure 17: This example creates an httpd server 
listening on the port 8000 and a WebSocket 
server listening on the port 8100 with a 
maximum notification rate of 200 ms. 


The WebSocket server allows bi-directional 
communication between the server and the 
client. It sends notifications of score changes 
each time the graphic appearance of the score 
is modified, provided that the notification rate 
is lower than the maximum rate set at server 
creation time. 

The communication scheme between a client 
and an INScore Web server relies on a reduced 
set of messages. These messages are proto¬ 
col independent and are equally supported over 
HTTP or WebSocket : 

• get: requests an image of the score. 

• version: requests the current version of the 
score. The server answers with an integer 
value that is increased each time the score 
is modified. 


click: intended to allow remote mouse in¬ 
teraction with the score. 


More details are available from Fober et al., 
20l5l - 

6.3 Messages forwarding 

Message forwarding is another mechanism pro¬ 
vided to distribute scores over a network. It op¬ 
erates at application and/or score levels when 
the forward the message is send to the appli¬ 
cation ( /ITL) or to a score ( /ITL/scene). The 
message takes a list of destination hosts spec¬ 
ified using a host name or an IP number, and 
suffixed with a port number. All the OSC mes¬ 
sages may be forwarded, provided they are not 
filtered out (Fig. 18). The filtering strategy 
is based on OSC adresses and/or on INScore 
methods (i.e. messages addressing specific ob¬ 
jects attributes). 

/ITL forward 192.168.1.255:7000; 

/ITL/filter reject 

1 /ITL/scene/j avascript’; 


Figure 18: The application is requested to 

forward all messages on INScore port (7000) 
to the local network using a broadcast 
address. Messages addressed to the 
JavaScript engine are filtered out in order to 
only forward the result of their evaluation. 


7 The sensor objects 

INScore runs on the major operating systems 
including Android and iOS. Tablet and smart¬ 
phone support have led to integrate gestural in¬ 
teraction with a set of sensor (Table [I]). 

Sensors can be viewed as objects or as signals. 
When created as a signal node, a sensor behaves 
like any signal but may provide some additional 
features (like calibration). When created as a 
score element, a sensor has no graphical appear¬ 
ance but provides specific sensor events and fea¬ 
tures. 

All the sensors won’t likely be available on a 
given device. In case a sensor is not supported, 
an error message is generated at creation re¬ 
quest and the creation process fails. 

8 The time model 

INScore time model has been recently extended 
to support dynamic time. Indeed and with the 
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name 

values 

accelerometer 

x, y, z 

ambient light 

light level 

compass 

azimuth 

gyroscope 

x, y, z 

light 

a level in lux 

magnetometer 

x, y, z 

orientation 

device orientation 

proximity 

a boolean value 

rotation 

x, y, z 

tilt 

x, y 


/ITL/scene/obj tempo 60 

-> /ITL/scene/obj ddate /(r,) 

-> /ITL/scene/obj ddate /(r,+i) 

-> /ITL/scene/obj ddate /(r,+ 2 ) 

-> ... 

Figure 19: A sequence of messages that activate 
the time of an object obj . Messages 
prefixed by -> are generated by the object 
itself. r, t is the value of the absolute time 
elapsed between the task i and i — 1. 


Table 1: The set of sensors and associated values 


initial design, the time attributes of an object 
are fixed and don’t change unless a time mes¬ 
sage (date, duration) is received, which can 
only be emitted from an external application or 
using the events mechanism. The latter (de¬ 
fined very early) introduced another notion of 
time: the events time, which takes place when 
an event occurs. The events system has also 
been extended for more flexibility. 

8.1 The musical time 

Regarding the time domain, any object of a 
score has a date and a duration. A new tempo 
attribute has been added, which has the effect of 
moving the object in the time dimension when 
non null, according to the tempo value and the 
absolute time flow. Let to be the time of the last 
tempo change of an object, let v be the tempo 
value, the object date dt at a time t is given by 
a time function /: 

f{t) d t = dto + (t - to) x v x k, t ^ t 0 (1) 

where di is the object date at time tj and k 
a constant to convert absolute time in musical 
time. In fact, absolute time is expressed in mil¬ 
liseconds and the musical time unit is the whole 
note. Therefore, the value of A: is 1 /1000 x 60 x 4. 

Each object of a score has an independent 
tempo. The tempo value is a signed integer, 
which means that an object can move forward 
in time but backward as well. 

From implementation viewpoint and when its 
tempo is not null, an object sends ddate (a rel¬ 
ative displacement in time) to itself at periodic 
intervals (Fig. [19] ). 

This design is consistent with the overall sys¬ 
tem design since it is entirely message based. It 
is thus compatible with all the INScore mecha¬ 
nisms such as the forwarding system. 


8.2 The events system 

The event-driven approach of time in INScore 
preceded the musical time model and has been 
presented in [Fober et al., 2013 . The event- 
based interaction process relies on messages 
that are associated to events and that are sent 
when the corresponding event occurs. The gen¬ 
eral format of an interaction message is de¬ 


scribed in Fig. 20 


( address watch —»•( event )—>-( ( messages ) ) 


Figure 20: Format of an interaction message: the 
watch request installs a messages list 
associated to the event event. 

Initially, the events typology was limited 
to classical user interface events (e.g. mouse 
events), extended in the time domain (see Ta¬ 
ble U). 


Graphic domain 

Time domain 

mouseDown 

timeEnter 

mouseUp 

timeLeave 

mouseEnter 

durEnter 

mouseLeave 

durLeave 

mouseMove 



Table 2: Main INScore events in the initial 
versions. 

This typology has been significantly extended 
to include: 

• touch events (touchBegin, touchEnd, 
touchUpdate), available on touch screens 
and supporting multi-touch. 

• any attribute of an object: modifying an 
object attribute may trigger the corre¬ 
sponding event, that carries the name of 
the attribute (e.g. x, y date, etc.). 
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• an object specific data i.e. defined with a 
set message. The event name is newData 
and has been introduced for the purpose of 
the symbolic score composition system. 

• user defined events, that have to comply to 
a special naming scheme. 


Any event can be triggered using the event mes¬ 
sage, followed by the event name and event’s de¬ 
pendent parameters. The event message may 
be viewed as a function call that generates OSC 
messages on output. This approach is particu¬ 
larly consistent for user events that can take an 
arbitrary number of parameters, which are next 
available to the associated messages under the 
form of variables named $l...$n (Fig. 21). 



Figure 22: Exemple of events placed in the time 
space. These events are associated to time 
intervals (timeEnter and timeLeave) and 
are triggered when entering (in red) of 
leaving (in blue) these intervals. The last 
event (e6) emits a date message that 
creates a loop by putting the object back at 
the beginning of the first interval. 


/ITL/scene/obj watch MYEVENT ( 
/ITL/scene/tl set txt $1, 
/ITL/scene/t2 set txt $2 

); 

/ITL/scene/obj event MYEVENT 

"This text is for tl" 
"This one is for t2"; 


Figure 21: Definition of a user event named 

MYEVENT that expects 2 arguments referenced 
as $1 and $2 in the body of the definition. 
This event is next triggered with 2 different 
strings as arguments. 


The time dimension of the events system al¬ 
lows to put functions in the time space un¬ 
der the form of events that trigger messages 
that can modify the score state and/or be ad¬ 
dressed to external applications using the ex¬ 
tended OSC addressing scheme (Fig. 22). 

Combined with the dynamic musical time, 
this events system allows to describe au¬ 
tonomous animated score. The example in Fig. 
l23l shows how to describe a cursor that moves 
forward and backward over a score by watch¬ 
ing the time intervals that precedes and follows 
a symbolic score and by inverting the tempo 
value. 


9 Conclusion 

INScorcj^is an ongoing open source project that 
crystallizes a significant amount of research ad¬ 
dressing the problematics of the music nota¬ 
tion and representation in regard of the con¬ 
temporary music creation. It is used in artistic 

'http://inscore.sf.net) 


# first clear the scene 
/ITL/scene/* del; 

# add a simple symbolic score 
/ITL/scene/score set gmn ’[c d e f g a h 
c2 ] ’; 

# add a cursor synchronized to the score 
/ITL/scene/cursor set ellipse 0.1 0.1; 
/ITL/scene/cursor color 0 0 250; 
/ITL/scene/sync cursor score syncTop; 

# watch different time zones 
/ITL/scene/cursor watch timeEnter 2 3 

( /ITL/scene/cursor tempo -60 ); 

/ITL/scene/cursor watch timeEnter -1 0 
( /ITL/scene/cursor tempo 60 ); 

# and finally start the cursor time 
/ITL/scene/cursor tempo 60; 


Figure 23: A cursor that moves forward and 
backward over a symbolic score. 


projects and many of the concrete experiences 
raised new issues that are reflected into some 
of the system extensions. The domain is quite 
recent and there are still a lot of open questions 
that we plan to address in future work and in 
particular: 

• turning the scripting language into a real 
programming language would provide a 
more powerful approach to music score de¬ 
scription. The embedded JavaScript en- 
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gine may already be used for an algorith¬ 
mic description of a score, but switching 
from one environment (INScore script) to 
another one (JavaScript) proved to be a bit 
tedious. 

• extending the score components to give a 
time dimension to any of their attributes 
could open a set of new possibilities, includ¬ 
ing arbitrary representations of the passage 
of time. 

Finally, migrating the INScore native environ¬ 
ment to the Web is part of the current plans 
and should also open new perspectives, notably 
due to the intrinsic connectivity of Web appli¬ 
cations. 
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Abstract 

PlayGuru is a practice tool being developed for be¬ 
ginning and intermediate musicians. With exercises 
that adapt to the musician, the intention is to help 
a music student to develop several playing skills and 
motivate them to practice in-between classes. Be¬ 
cause all exercise interaction is entirely based on 
sound, the author believes PlayGuru is particularly 
useful for blind and visually impaired musicians. Re¬ 
search currently focuses on monophonic exercises. 
This paper is a report of the current status and ul¬ 
timate goals. 

Keywords 

Computer-assisted music education, DSP, machine 
learning 

1 Project objectives 

The main reason for writing this paper is to 
bring the project to the attention of others so 
they can use, improve and benefit from the 
ideas, technology and the intended end prod¬ 
uct. 

Even though few user experiences can be re¬ 
ported at the time of writing, some intermediate 
results and plans for the near future are given 
towards the end of this paper. 

PlayGuru is a music tutor that operates ex¬ 
clusively in the sound domain. Because the fo¬ 
cus is only on music, the author believes it is a 
very useful tool for beginning and intermediate 
musicians and particularly useful for blind and 
visually impaired musicians. 

1.1 Practice motivation 

How does an amateur musician find the moti¬ 
vation to pick up their instrument and play? 
What motivates children to practice? 

These questions probably have a large spec¬ 
trum of answers. Let us focus on two motiva¬ 
tional forces: personal growth and affirmation. 

PlayGuru is a set of music exercises based 
on an example and response approach. Dialogs 
usually start with an example being played and 


base the next action upon the response of the 
musician. The example and response can also 
be played simultaneously, creating a sense of 
playing together. Those synchronised exercises 
give room to improvisation and exploration. 

The way in which PlayGuru aims to keep the 
user motivated is based on affirmation when the 
exercise is performed according to predefined 
objectives. 

It will never flag a “wrong note”, as this is 
considered highly demotivating. Instead, it re¬ 
sponds by making the exercise slightly easier 
when it finds you are struggling with the cur¬ 
rent level, until it gets to a level from which you 
can continue growing again. 

The most important affirmative motivators 
used are: 


• increasing playing speed 

• extending the phrase 

• increasing complexity of the exercise 


The current version contains a very basic user 
model, which is a starting point for a module 
that monitors a user’s achievements and keep 
track of their progress, thus supporting their 
personal growth. 

To perform user tests supporting the re¬ 
search, several exercises are being implemented. 
At the time of writing, a versatile sound-domain 
guitar tuner with arbitrary tuning is available, 
as is an exercise for remembering a melodic 
phrase and a riff trainer for practicing licks at 
high speed. 

Most exercises are developed for the guitar. 
A great source of inspiration for guitar practice 
is the book by Scott Tennant: Tennant, 1995 
with a focus on motor skills and automation. 

Because all exercises are based on pitch- and 
onset-detection in sound signals, adaptation for 
other instruments with discrete pitch and a 
clear onset should be reasonably straightfor¬ 
ward. For instruments with arbitrary pitch 
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ranges and smooth transitions, as well as hu¬ 
man voice, some additional provisions may be 
necessary. 

1.2 Origin of the project 

The PlayGuru project started as part of the au¬ 
thor’s Master’s course. While trying to find 
ways to improve the effectiveness of practice 
routines for the guitar, some research was done 
into existing solutions. 

Several systems for computer assisted music 
training were found and some have been put to 
the test. A shortlist is included at the end. One 
thing all the encountered solutions have in com¬ 
mon is that they rely heavily on visual interac¬ 
tion. In many cases this implies written score 
or tablature, in other cases a game-like environ¬ 
ment in which the user has to act upon events 
happening in a graphic scene. 

In several cases the author found the visual 
information distracting from the music. Thus 
the idea arose for a practice tool exclusively 
working with sound. 

Shortly after that, the foundation Con- 
nect2Music 0 came into view. Connect2Music, 
founded in 2013, provides information with re¬ 
spect to music practice by visually impaired mu¬ 
sicians. 

According to |Mak, 2015 , the facilities for 
blind music students in The Netherlands are 
limited. Even though the situation is improv¬ 
ing, a practice tool which focuses only on the 
music itself would be a much wanted addition. 

Thus a project was born: to find ways to im¬ 
prove the learning path for beginning and in¬ 
termediate musicians with music as the key el¬ 
ement and primarily addressing blind and visu¬ 
ally impaired people. 

The prototype being developed for perform¬ 
ing this research is called PlayGuru. The envi¬ 
sioned end product aims to help and encourage 
a music student to perform certain exercises in- 
between music classes and is meant to comple¬ 
ment rather than replace regular classes from a 
human teacher. 

Through the contacts of Connect2Music with 
the community, several blind and visually im¬ 
paired musicians and software developers in The 
Netherlands and Belgium expressed their inter¬ 
est in this project and offered help to assess and 
assist. 


1 https://www.connect2music.nl 


2 Research 

To support the research with experiences of end 
users, some application prototypes are being de¬ 
veloped. The adaptive exercises used in these 
prototypes will briefly be introduced separately. 

This chapters discusses the software and the 
chosen methods for interacting with the user. 

2.1 Software 

The framework and all exercises are currently 
implemented in C+-1-11. For audio signal anal¬ 
ysis, the Aubio [^] library is used. Exercises are 
composed in real time according to musical rules 
or taken from existing material like MIDI files 
and guitar tablature. 

2.2 Dependencies 

Development is done on Linux and Raspbian. 
Porting to Apple OSX should be relatively easy 
but has not been done yet. The most prominent 
dependencies, as in libraries, are jackd, aubio, 
fftw3 and portmidi. For generating sound, Flu- 
idsynth is used. Stand-alone versions use a 
Python script to connect the hardware user- 
interface to the exercises. 

2.3 Practice companion 

When playing along with a song on the radio 
you will need to adjust to the music you hear, 
as it will not wait for you. Playing with other 
musicians has entirely different dynamics. Peo¬ 
ple influence each other, try to synchronise, tune 
in and reach a common goal: to make the music 
sound nice and feel good about it. 

When practicing music with a tool like 
PlayGuru it would be nice to have a dialog with 
the tool, instead of just obeying to its rules. 
This is exactly what makes PlayGuru interact 
so nicely. It listens to you and adapts, thus be¬ 
having like a practice companion. 

How this is achieved is shown with refer¬ 
ence to the software architecture and indica¬ 
tions which parts have been realised and which 
are being developed. 

2.4 Architecture 

The modular design of PlayGuru is shown in fig¬ 
ure [lj with the Exercise Governor as the module 
from which every exercise starts. 

The Exercise Governor reads a configuration 
file containing information about the user, the 
type of exercise, parameters defining the course 
of the exercise, various composition settings and 
possibly other sources like MIDI files. 

i https://aubio.org 
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Figure 1: PlayGuru’s software architecture 

When the exercise starts, the Composer will 
generate a MIDI track, or read it from the spec¬ 
ified file. During the exercise, this generating 
may take place again, depending on the type of 
exercise and the progress of the musician. 

The MIDI track is played by the sound mod¬ 
ule, which also captures the sound that comes 
back from the musician or their instrument. 

The Assessor contains all the sound process¬ 
ing and assessment logic and reports back to the 
Exercise Governor, which calls in the help of the 
User Model to decide how to interpret the data 
and what to do next. 

2.5 Playback and analysis 

Playback and analysis run in separate threads, 
but share the same time base for relating the 
output to the input. 

Incoming audio is analysed in real time to de¬ 
tect pitch(es) and onsets, which are used to as¬ 
sess the musician’s play in relation to the given 
stimuli. Pitch and onset detection are done us¬ 
ing the Aubio library. 

2.6 The Exercise Governor 

Every exercise type is currently implemented as 
a separate program. The Exercise Governor is 
essentially a descriptive name for the main pro¬ 
gram of each exercise, which uses those parts 
from the other modules that it needs for a cer¬ 
tain exercise. Currently these are compiled and 
linked into the program. With these building 
blocks it determines the nature of the exercise. 

As an example: to let the musician work 
on accurate reproduction of a pre-composed 
phrase, the Exercise Governor will ask the Com¬ 
poser to read a MIDI file, call the MIDI play 
routine from the Sound module, then let the 
Assessor assess the user’s response and consult 
the User Model, given the Assessor’s data, for 
determining the next step. 

For a play-along exercise using generated 
melodies, the Exercise Governor uses routines 


□ 


Gnuplot window 0 


= L ’ e I / 


Match timing 



12,6539, 471.381 


Figure 2: Note onset absolute time difference 


from the same modules, but with a different in¬ 
tention. In this case it would let the Composer 
create a new phrase when needed, ask the As¬ 
sessor to run a different type of analysis and 
perform concurrent scheduling of playback and 
analysis. 

2.7 The Assessor 

PlayGuru’s exercises generally consist of a play- 
and evaluation loop. For some exercises, the 
evaluation process is run after the playback iter¬ 
ation, while for others they run simultaneously. 

Figure [2] shows the measured timing of a mu¬ 
sician playing along with an example melody. 
The time of each matched note is compared to 
the time when that note was played in the ex¬ 
ample. In this chart we see that the musician 
played slightly “before the beat”. 

This absolute timing indicates whether the 
musician is able to exactly copy the example, 
which can be seen clearly for a melody constist- 
ing of only equidistant notes. 

More interesting however is relative tim¬ 
ing. This indicates whether the musician 
keeps the timing structure of the example in¬ 
tact. In this case we calculate the differ¬ 
ences in spacing of the onsets of successive 
notes, either numerically or as an indication of 
“smaller/equal/larger” and compare the result 
to the structure of the example. In figure [3] this 
is shown. Here we can see that the musician 
started out with confidence and needed more 
time to find the last notes of the phrase. The 
example consisted of equidistant notes, which 
would result in a chart of zeros and is therefore 
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Figure 3: Note onset relative time difference 


omitted. 

2.8 The Composer 

The Composer generates musical phrases based 
on given rules. The current implementation 
uses melodic intervals in a specified range, with 
an allowed set of intervals and within a given 
scale. The scale is listed as a combination of 
tones (T) and semitones (S), as in the examples 
in table [T] and can be specified as needed. 


T,T,S,T,T,T,S 

major 

T,S,T,T,T,S,T 

minor 

QQQQQQQQQQQ 
0,0,0,0,0,0,0,0,0,0,0 

12 -tone 


Table 1: Scale examples 

2.9 The User Model 

At the time of writing, the User Model is partly 
implemented. 

The part that has been implemented and is 
currently tested by end users is the mapping 
from analysed properties of the user’s playing 
to parameters that are musically meaningful or 
significant for the user’s ambitions. 

In general this mapping is a linear combina¬ 
tion of those properties. For example, the user’s 
melodic accuracy can be expressed as a combi¬ 
nation of hitting the correct notes and the lack 
of spurious or unwanted notes. 

The weight factors are empirically deter¬ 
mined, as are several parameters in the exer¬ 
cises, such as the number of repetitions before 
moving on or the required proficiency for in¬ 
creasing the level of an exercise. 


It is here where the assistance of a Machine 
Learning algorithm is wanted: to learn which 
weight factors and other parameters contribute 
to the user’s goals and to optimise these. This 
has not been implemented yet and is currently 
being studied. 


2.10 Melodic similarity 

There are several ways to find out the similarity 
between the given example and the musician’s 
response. In the current research, only the onset 
(i.e. start) and pitch of notes are taken into 
account. Although timbre, loudness and various 
other features are extremely useful, these are 
ignored for the time being. 

In the exercises where the user is asked to 
memorise and copy an example melody, the ac¬ 
complishment of this task is purely based on 
hitting the correct notes in the correct order. 
The similarity however is also reflected in the 
timing. The extent to which the musician keeps 
the rhythm of the example intact is a property 
that is measured and evaluated. 

In the exercises where the musician plays 
along with a piece of music, we have much more 
freedom in the assessment. In this case, playing 
the exact same notes as in the example is not 
always necessary. For some exercises it would 
suffice to improvise within the scale or play cer¬ 
tain melodic intervals. 

In these situations, similarity measures also 
allow for more freedom. 

A method that is used in an exercise called 
“riff trainer”, focused on automation of and cre¬ 
ating variations on a looped phrase, observes 
notes in the proximity of example notes and 
draws conclusions based on the objectives of the 
exercise. This allows for both very strict adher¬ 
ence to the original melody as well as melodic 
interval-based variations, depending on the as¬ 
sumed objectives. 

Another method compares the Markov chain 
of the example with that of the musician’s re¬ 
sponse. 

Some inspiration is gained from this book 


about melodic similarity: Walther B. Hewlett, 
1998] 


3 Personal objectives 


Figure [4] shows that an exercise is ‘composed’ 
and played. The response of the musician is as¬ 
sessed and mapped to temporary skills. Long¬ 
term skills are accumulated in a user model, 
which tries to construct an accurate profile of 
the musician and their personal objectives. 


















LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


57 



Figure 4: Machine assisted learning supported 
by machine learning 

Examples of these objectives are to strive for 
faster playing, memorise long melodies or im¬ 
prove timing accuracy. These need to be ex¬ 
pressed as quantifyable properties. Faster play¬ 
ing can be expressed as playing a phrase faster 
than before or playing it faster than the given 
example, which can be measured. 

Likewise, memorising a melody involves the 
length of the melody that can accurately be 
played at reasonable speed. Obviously, the con¬ 
cepts ‘accurately’ and ‘reasonable’ have to be 
quantified. 

An indication of accuracy in playing is ob¬ 
tained by measuring the number of spurious 
notes, missed notes and timing. 

Some approaches for machine learning (ML) 
will be investigated. The term “machine learn¬ 
ing” is used here to express that the machine 
itself is learning and does not refer to the “ma¬ 
chine assisted learning”, which is the main topic 
of this paper. A machine learning algorithm 
is thought to be able to achieve the user’s ob¬ 
jectives by adjusting the mapping parameters 
that translate measured quantities to short-time 
skills and several properties of the exercises. 

Depending on the exercise, various factors are 
measured, such as matched notes, missed notes, 
spurious notes, adherence to the scale, speed 
and timing accuracy. 

These quantities are mapped to short-term 
skills according to table [2] 

Apart from these measured data, the exer¬ 
cises also contain configuration parameters that 
can be optimised for each user. These are found 
in composer settings and the curves used to con¬ 
trol the playing speed and exercise complexity. 


Measured quantity 

Mapped to 

timing deltas 

timing accuracy 

missed notes 

melodic accuracy 

spurious notes 

clutter 


Table 2: Mapping measured data to skills 


4 Hardware 

The starting point for this project is to assess 
the sound of an unmodified instrument. In this 
section, the current choice of hardware is dis¬ 
cussed. 

A brief side-project was undertaken to equip 
an acoustic guitar with resistive sensors for 
detecting the point where strings are pressed 
against the fretboard, but because this doesn’t 
look and feel natural, would imply that all users 
would need to install a similar modification and 
would exclude all instruments other than guitar, 
this was discarded. 

Because nylon-string acoustic guitar is the 
primary instrument for the author as well as 
for lots of beginning music students, the deci¬ 
sion was to analyse the sound of the instrument 
with a microphone or some kind of transducer. 

Using a microphone raises the problem that 
the sound produced by PlayGuru interferes with 
the sound of the instrument. Source separa¬ 
tion techniques are not considered viable for 
this project due to the added complexity and 
because we want to be able to play and listen 
simultaneously, often to the exact same notes. 
This would justify a study of itself. 

Requiring the musician to use a headset is 
also considered undesirable. So the only option 
left seems to use a transducer attached to the 
instrument. 

After some experimenting with various com¬ 
binations of guitar pickups, sensors, preamps 
and audio interfaces, it turned out that a com¬ 
bination of a simple piezo pickup and a cheap 
USB audio interface does the job very well. 

Successful measurements were done with the 
piezo pickup attached to the far end of the neck 
of a guitar. It is advised to embed the pickup 
into a protective cover to prevent the element 
itself and the cable from being exposed to me¬ 
chanical strain and mount it with the piezo’s 
metal surface touching the wood of the gui¬ 
tar using a rubber padded clamp from the DIY 
store. 
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Figure 6: Stand-alone device for user tests 

Figure 5: Embedded piezo pickup 


5 Dissemination 

User’s experiences and feedback are of crucial 
importance for the development of this tool. 
While a browser application or mobile app seem 
obvious ways to reach thousands of musicians, 
development is currently done on Linux and 
Raspbian. This is a deliberate choice, partly in¬ 
spired by the author’s lack of experience with 
Webaudio intricacies and the acclaimed large 
round-trip audio latency of Android devices, for 
which a separate study may be justified. 

For a large part however, this choice is sup¬ 
ported by the wish to have an inexpensive 
stand-alone, self-reliant, single-purpose device. 

A series of stand-alone PlayGuru test devices 
are being developed, based on a Raspberry Pi 
with a tactile interface meant to be intuitive 
to blind people. The idea is to attach a piezo 
pickup to the instrument, plug in, select the ex¬ 
ercise and start practicing. 

6 Conclusions 

Several beginners, intermediate guitar players 
and some people with no previous experience 
have used PlayGuru’s exercises in various stages 
of development. The two most mature exercises 
used are copying a melody and playing along 
with a melody. In most cases the melodies were 
generated in real time based on the aformen- 
tioned interval-based rules. In some cases a pre¬ 
composed MIDI file was used. 

From the start it was clear that users enjoy 
the fact that PlayGuru listens to them and re¬ 
wards “well played” responses with an increase 
in speed or making the assignment slightly more 
challenging. 


Several users mentioned a heightened focus, 
meaning that they were very concentrated for 
a longer time to keep the interaction going and 
the level rising. Mistakes bring down the speed 
or level in a suble way and do lead to a slight 
disappointment, which in many cases proved to 
be an incentive to get back into the ‘flow’ of the 
exercise. 

The project is work in progress. For a well- 
founded opinion on the practical use, a lot more 
user tests need to be done but the author’s con¬ 
clusion, based on results so far, is that the ap¬ 
proach is promising. 


7 Future Work 

At the time of writing, research concentrates 
on monophonic exercises with a basic machine 
learning algorithm. 

For the near future the author has plans to 
perform a study with a larger group of users who 
can use the system by themselves for a longer 
time. This requires creating test setups and/or 
porting the software to other platforms. 

Monophonic exercises may be the preferred 
way to develop several skills, but being able to 
play, or play along with, your favourite mu¬ 
sic is much more motivating, particularly for 
children. An approach resembling Band-in-a- 
Box© is taken, using a multi-channel MIDI file 
as input, with the possibility of indicating which 
channels will be heard and which channel will be 
‘observed’. This requires both polyphonic play 
and analysis, which are largely implemented 
but currently belong in the Future Work sec¬ 
tion. On the analysis side, a technique based 


on chroma vectors Tzanetakis, 2003 is being 
tested. 
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Machine learning strategies are being stud¬ 
ied but have not yet been implemented. The 
assistance of co-developers would be much ap¬ 
preciated. 
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9 Other solutions for 

computer-assisted music education 

• i-maestro 

• Bart’s Virtual Music School 

• Rocksmith ™ 

• yousician.com 

• bestmusicteacher.com 

• onlinemuziekschool.nl 

• gitaartabs.nl 
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Abstract 

Faust [Functional Audio Stream] is a functional 
programming language specifically designed for real¬ 
time signal processing and synthesis 11. It consists 
of a compiler that translates a Faust program into 
an equivalent C++ program, taking care of generat¬ 
ing the most efficient code. JUCE is an open-source 
cross-platform C++ application framework devel¬ 
oped since 2004, and bought by ROL^j] in Novem¬ 
ber 2014, used for the development of desktop and 
mobile applications. A new feature to the Faust 
environnement is the addition of architectures files 
to provide the glue between the Faust C++ output 
and the JUCE framework. This article presents the 
overall design of the architecture files for JUCE. 

Keywords 

JUCE, Faust, Domain Specific Language, DSP, 
real-time, audio 

1 Introduction 

From a technical point of view FausiJ^] ( Func¬ 
tional Audio Stream ) is a functional, syn¬ 
chronous, domain specific language designed for 
real-time signal processing and synthesis. A 
unique feature of FAUST, compared to other ex¬ 
isting languages like Max, PD, Supercollider, 
etc., is that programs are not interpreted, but 
fully compiled. 

One can think of FAUST as a specification lan¬ 
guage. It aims at providing the user with an 
adequate notation to describe signal processors 
from a mathematical point of view. This spec¬ 
ification is free, as much as possible, from im¬ 
plementation details. It is the role of the FAUST 
compiler to provide automatically the best pos¬ 
sible implementation. The compiler translates 
FAUST programs into equivalent C++ programs 
taking care of generating the most efficient code. 
The compiler offers various options to control 
the generated code, including options to do fully 

'https://roli.com/ 

'http://faust.grame.fr 


automatic parallelization and take advantage of 
multicore machines. 

The generated code can generally compete 
with, and sometimes even outperform, C++ 
code written by seasoned programmers. It 
works at the sample level, it is therefore suited 
to implement low-level DSP functions like recur¬ 
sive filters up to fullscale audio applications. It 
can be easily embedded as it is selfcontained and 
does not depend of any DSP library or runtime 
system. Moreover it has a very deterministic 
behavior and a constant memory footprint. 

Being a specification language the FAUST 
code says nothing about the audio drivers or 
the GUI toolkit to be used. It is the role of 
the architecture file to describe how to relate 
the DSP code to the external world |2j. This 
approach allows a single FAUST program to be 
easily deployed to a large variety of audio stan¬ 
dards (Max-MSP externals, PD externals, VST 
plugins, CoreAudio applications, JACK appli¬ 
cations, etc.), and JUCE is now supported. 

The aim of JUCE|3) is to allow software to be 
written such that the same source code will com¬ 
pile and run identically on Windows, Mac OS X, 
Linux platforms for the desktop devices, and on 
Android and iOS for the mobile ones. A notable 
feature of JUCE when compared to other similar 
frameworks is its large set of audio functional¬ 
ity. Those services, the user-interface possibil¬ 
ities and the multi-platform exportability posi¬ 
tion JUCE as a great framework for FAUST to 
get exported on, to have in the future less code 
to maintain up-to-date, and simpler utilization. 

In section 2, the idea and the use of the GUI 
architecture file will be introduced. In section 
3, the JUCE Component hierarchy will be pre¬ 
sented without going into many details. Sec¬ 
tion 4 is the main one, explaining in detail the 
graphical architecture file for JUCE. MIDI and 
OSC architecture files are introduced in Section 
5. Section 6 will treat of the "glue" between 
JUCE audio layers and Faust ones. Section 7 



LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


62 


presents the faust2juce script. Section 8 is a support the following methods to setup this hi- 
quick tutorial on how to use JUCE for FAUST. erarchy: 


2 Faust GUI architecture files 

A Faust UI architecture is a glue between a 
host control layer and a FAUST module. It is 
responsible to associate a FAUST module pa¬ 
rameter to a user interface element and to up¬ 
date the parameter value according to the user 
actions. This association is triggered by the 
dsp: :buildUserInterface call, where the DSP 
asks a UI object to build the module controllers. 

Since the interface is basically graphic ori¬ 
ented, the main concepts are widget based: a 
UI architecture is semantically oriented to han¬ 
dle active widgets, passive widgets and widgets 
layout. 

A FAUST UI architecture derives a UI class, 
containing active widgets, passive widgets, lay¬ 
out widgets, and metadata. 


2.1 Active widgets 

Active widgets are graphical elements that 
control a parameter value. They are initialized 
with the widget name and a pointer to the 
linked value. The widget currently considered 
are Button, ToggleButton, CheckButton, 
RadioButton, Menu, VerticalSlider, 
HorizontalSlider, Knob and NumEntry. 

A UI architecture must implement a method 
addxxx (const char* name, float* zone, 

. . .) for each active widget. Additional param¬ 
eters are available to Slider, Knob, NumEntry, 
RadioButton and Menu: the init value, the min 
and max values and the step (RadioButton, 
Menu and Knob being special kind of Sliders, 
cf. subsection 2.4, Metadata). 


2.2 Passive widget 

Passive widgets are graphical elements that 
reflect values. Similarly to active widgets, 
they are initialized with the widget name and 
a pointer to the linked value. The widget 
currently considered are NumDisplay, Led, 
HorizontalBarGraph and VerticalBarGraph. 
A UI architecture must implement a method 
addxxx (const char* name, float* zone, 

. . .) for each passive widget. Additional 
parameters are available, depending on the 
passive widget type. (NumDisplay and Led are 
a special kind of BarGraph, cf. Subsection 2.4). 


2.3 Widget layout 

Generally, a UI is hierarchically organized into 
boxes and/or tab boxes. A UI architecture must 


openTabBox (const char* label) 
openHorizontalBox (const char* label) 
openVerticalBox (const char* label) 
closeBox (const char* label) 

Note that all the widgets are added to the cur¬ 
rent box. 

2.4 Metadata 

The Faust language allows widget labels 
to contain metadata enclosed in square 
brackets. These metadata are handled 
at UI level by a declare method taking 
as argument, a pointer to the widget as¬ 
sociated value, the metadata key and value: 
declare(float*, const char*, const char*) 
Metadata can also declare a DSP as polyphonic, 
with a line looking like declare nvoices "8" 
for 8 voices. This will always output a poly¬ 
phonic DSP, either you use the polyphonic 
option of the compiler or not. This number of 
voices can be changed with the compiler (cf. 
Section 7). 

For instance, if the program needs a Slider 
to be a Knob, those lines are written: 

declare(fefVsliderO, "style", "knob"); 
addVerticalSlider("Vol", fefVsliderO,...); 

The style can be a knob, menu, etc... de¬ 
pending on the program. 

Multiple aspects of the items can be described 
with the metadata, such as the type of the item 
just as seen before, the tooltip of the item, the 
unit, etc... 

3 JUCE Component class 

To implement a complete program, the graph¬ 
ical elements described in the previous section 
need to be combined with JUCE classes. In the 
JUCE Framework, the component class is the 
base-class for all JUCE user-interface objects. 
The following section explains the relationship 
between FAUST GUI architecture files, and the 
JUCE mechanics. 

3.1 Parent and child mechanics 

As most frameworks have, JUCE has a hi¬ 
erarchy of Component objects, organized in a 
tree structure. The common way to set a 
Component as child of another component is 
to do parent->addAndMakeVisible(child);. 
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This function sets the child component as vis¬ 
ible too, because it’s not by default. Multi¬ 
ple functionalities are accessible to run through 
this Component tree, with methods that give the 
child Component at index i, or give the parent. 
There’s even a function allowing to get the par¬ 
ent of a Component with a specific type, this type 
being a derived class of Juce: : Component. How¬ 
ever, this function does not exist for the child, 
and imply that dynamic_cast has to be done if 
you want to get a child of a certain type. 

3.2 Component setup mechanics 

First of all, a Component is drawn if it’s visi¬ 
ble, and its parent too. If a Component is not 
visible, its child and all of its children, etc... 
will not be visible, but as addAndMakeVisible 
function is used most of the time, this 
should not be a problem. A Component has 
a Rectangle<int> boundsRelativeToParent, 
containing its x and y coordinates, and its width 
and height. As the variable name implies, the 
bounds of a Component is relative to its parent, 
and not absolute in the window ; it is very im¬ 
portant in the architecture files for FAUST, as 
will be demonstrated in Isubsection 4.4l 

3.3 Drawing mechanics 

A Component has two virtual function^ 
that are the main tools to handle a dy¬ 
namic layout, the void resized() and 
void paint (Graphicsfe g) functions. The 
resized one is called each time a Component 
bounds are changed, and the paint one when 
the Component flag indicates that it needs to be 
repainted. The mouse cursor being on top of 
it, a mouse click, the Component bounds being 
changed, or one or multiple of its child needing 
to be repainted indicates that it needs to be 
repainted for example. 

There is a design class called LookAndFeel 
that allows customization of the interface. The 
LookAndFeel objects defines the appearance of 
all the JUCE widgets, and subclasses can be 
used to apply different ’skins’ to the application. 

There is obviously a lot more to the 
Juce: : Component class, but that’s the basics, 
or at least what the architecture files need. 

4 JuceGUI architecture file 

To summarize what has been seen before, the 
system of widgets and boxes of Faust needs to 

3 placeholder functions which programmer must im¬ 
plement 


be adapted to the Juce: :Component mechanics 
in an architecture file called JuceGUI.h. The 
following section discusses annotated examples. 

4.1 Two different kinds of objects 

There are two kinds of object used in the adap¬ 
tation: 

• uiComponent, which are basically any items 
of the FAUST program, like sliders or but¬ 
tons. 

• uiBox, which is container component, and 
so can contain a uiComponent or some oth¬ 
ers uiBox. 

Both are derived classes of a 
uiBaseComponent class, which is itself a 
derived class of Juce: Component. 

The uiBaseComponent class regroups meth¬ 
ods shared by both uiBox and uiComponent, 
like void setRatioO, int getTotalWidthO, 
etc.... This way, too many dynamic_cast 
in our code are avoided. Here’s what the 
uiBaseComponent class contains: 

float fHRatio, fVRatio; 
int fTotalWidth, fTotalHeight; 
int fDisplayRectHeight, 
fDisplayRectWidth; 

String fName; 

uiBaseComponent(int totWidth, 

int totHeight, String name); 

int getTotalHeight() ; 
int getTotalWidthO ; 
virtual void setRatioO; 
float getHRatioO; 
float getVRatioO; 

String getNameO; 

void setHRatioO; 

void setVRatioO 

void setBaseComponentSize 

(Rectangle<int> r); 
void mouseDoubleClick 

(const MouseEvent feevent) override; 

virtual void writeDebugO = 0; 
virtual void setCompLookAndFeel 
(LookAndFeel* laf) = 0; 

The mouseDoubleClick function is a JUCE 
overridable function, which is called every time 
a Component is double-clicked. Here it’s used 
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to call the writeDebug function, showing differ¬ 
ent characteristics of the double clicked uiBox 
or uiComponent. 

The two pure virtual functions are defined 
to have their own behavior for both uiBox and 
uiComponent, not being the same obviously. 

The virtual void setRatioQ; function is 
virtual because there is a special case with the 
uiBox, which is setting her own ratio, and need 
to be asking its child to set their ratios too, in 
a recursive way. 

As said before, uiComponent inherits from 
those uiBaseComponent functions, and is itself 
a mother class for plenty of different widgets. 
Here’s the inheritance diagram: 



Figure 1: Inheritance diagram 


A uiComponent subclasses can handle multi¬ 
ple "type" of items. 

For instance, uiSlider groups every kind of 
sliders: HorizontalSlider, VerticalSlider, 
NumEntry and Knob. 

4.2 The main window 

The user interface cannot be shrunk infinitely in 
order to be always lisible and clear, so a mini¬ 
mal window size is defined. That implies that in¬ 
stead of a basic Component in a DocumentWindow 
(a resizable window with a title bar and max¬ 
imise, minimise and close buttons), a Viewport 
in a DocumentWindow is used, which displays 
scrollbars when the window gets lower dimen¬ 
sions than the minimal size of the FAUST DSP 
program, allowing to have full access to the user 
interface even in the lower dimensions. 

This Viewport can either contains a uiBox 
as presented before, or a uiTabs if the program 
requires tabs. 


4.3 uiTabs class 

The uiTabs class inherits of 
Juce: : TabbedComponent, which is a 
Juce:Component with a TabbedButtonBar on 
one of its size. It just needs a Juce: : Component 
for each tab, and a tab name, and it will display 
them. 

A tab layout is needed when the 
buildUserlnterf ace starts with a openTabBox 
call. In this, a boolean tabLayout is set to true, 
to know that it’s a tab layout. 

While parsing the buildUserlnterf ace, a 
uiBox is given to the uiTabs every time the 
current tab is "closed". To do that, a vari¬ 
able called order keeps track of the "level" of 
the current box. The order starts at 0, is incre¬ 
mented when a new box is opened, and decre¬ 
mented when a box is closed. If the order is 0 
in a closeBoxO call, then a tab is being closed, 
and so the current box is added to the uiTabs, 
using the TabbedComponent: : addTab function. 

Once all the tabs are closed, the tabBox is 
closed too, the order is now at -1, and it 
triggers the initialization function of uiTabs, 
uiTabs:: init (). It’ll be described it in the 
next subsection. 

4.4 Initialization of the layout 

First of all, while parsing the 
buildUserlnterf ace lines, which are list¬ 
ing the different boxes and items that need 
to be displayed, the tree is getting built. 
It’s done using the Juce:: Component me¬ 
chanics of addAndMakeVisible. The different 
uiBaseComponent are added as child of different 
uiBox, and uiBox display rectangle size and 
total size are calculated every time a box is 
closed in the buildUserlnterf ace (i.e. when 
closeBoxO is called). 

The uiBox display rectangle size is the sum 
of his child width and the maximum of his child 
height, and the contrary depending on its ori¬ 
entation. But margins are added to our display 
rectangle width and height, 4 pixels per child, 
for a margin of 2 pixels on the top, left, bottom 
and right, and the uiBox total size is obtained. 
This is to avoid an overlapping effect, having two 
items touching each other. Following the same 
spirit, 12 pixels are added to the height of the 
box if its name needs to be displayed, 12 pixels 
being the space needed to display its name. 

Here’s the buildUserlnterf ace that display 
this program: 

ui_interface->openHorizontalBox("TITLE1") 
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TITLEl 



Figure 2: Representation of the display rectangle size and 
the total size of a box with four child 


ui_interface->addVerticalSlider("Slider1", 
fefVsliderO, O.Of, O.Of, 6.Of, l.Of); 
ui_interface->addVerticalSlider("Slider2", 
fefVsliderl, O.Of, O.Of, 6.Of, l.Of); 
ui_interface->addVerticalSlider("Slider3", 
&fVslider2, O.Of, O.Of, 6.Of, l.Of); 
ui_interface->addVerticalSlider("Slider4", 
&fVslider3, O.Of, O.Of, 6.Of, l.Of); 
ui_interface->closeBox(); 

In Figure 2, the difference between the dis¬ 
play rectangle size and the total size can 
be easily seen. The total size of the box 
here named "TITLEl" is the lighter gray, and 
the display rectangle size would be the four 
darker gray rectangle stick together. The layout 
is not aligned seamlessly because of the margin 
that is implemented to avoid the overlapping of 
the components. 

The space left on the top of the box is for its 
title, and this margin is included in the total 
size. 


n— 1 


h = Y, (<*■ H ) 

(1) 

i =0 


w = max Ci.W 

(2) 

*G[0,n—1] 

H = h + 4 * n 

(3) 

W = w + 4 

(4) 


In those equations, H is the total height, 
W the total width, h the display rectangle 
height, and w display rectangle width ; c, 
being the nth child component of the current 
box. 


H might get incremented by 12 pixels, de¬ 
pending on the need to display the box name. 

4 pixels for each child component are added 
on a dimension to have margins between each 
of them, because they will be placed aside of 
each other in this dimension, and simply 4 pixels 
added to the other dimension to have 2 pixels 
separating parent and child box on each side. 

Once buildUserlnterface is done, the last 
box is closed, and the user interface initialized. 
This last box, that will be called the "main box" 
is initiated with ratios of 1 and 1, even if they 
are needed, because it’ll take the window size. 
Here’s how the UI is initialized: 

• Setting the actual rendering size for the 
main box, because the total size is 
set here, but not the Juce:: Component 
bounds. That’s done through the 
void setBaseComponentSize 
(Rectangle<int> r) methods, which sets 
the size of the components, and especially 
position them right. Concretely, a 30 pix¬ 
els offset is needed on the height for a tab 
layout, 30 pixels being the height taken by 
the tab bar. Only the main box needs to 
be set with an offset, because other boxes 
will be positioned depending on its parents 
coordinates. 

• After that, the ratios are calculated for 
the whole tree, from root to leaves. The 
horizontal ratio is the component total 
width divided by its parent display rectan¬ 
gle width, same for the height. This way, 
it avoids to have the margins to mess with 
our ratios, and to have a sum of ratio equals 
to 1 instead of one approaching 1, but not 
being 1 exactly. 

• Last step is to set the LookAndFeel for all 
uiComponents, which are for all of them 
the leaves of the trees. So the tree is fully 
parsed there, root to leaves. 

The only possible change in the initial¬ 
ization of the program, is in a case of a 
tab layout. The uiTabs : : init () method 
just calls the uiBox: : setRatioO and the 
uiBox::setCompLookAndFeel(LookAndFeel*) 
for every of its tab component. 

While going through all the tabs, the algo¬ 
rithm keeps track of the minimal size of the 
uiTabs component to be displayed. Its mini¬ 
mal dimensions being the maximum width and 
the maximum height of all its tabs. 
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There, the tree is built, the total size has been 
initiated, display rectangle size and the ratios for 
all components, all the uiBox and uiComponent. 

4.5 Dynamic Layout 

At that point, the user interface is dis¬ 
played at his original size, but it needs to 
adapt to the potential resizing of the win¬ 
dow. To do that, the uiBoxes are used 
to layout all the items. A uiBox item has 
a void arrangeComponents(Rectangle<int> 
functionRect) function, which is the main tool 
to organize the layout. It’s called whenever the 
resized!) function of the main box is called. 

In this function, the initial rectangle given 
as argument, that is basically the window size, 
will propagate through all the child uiBox and 
uiComponent, in a recursive way |4|. 

At the beginning, it checks if the name needs 
to be displayed, and as no child components 
should be displayed there, it cuts 12 pixels from 
the top of the functionRect, given as argument. 

After that, the margins are sets, so 2 pixels are 
cut on the left, top, right and bottom side. This 
way, overlapping components are avoided. Once 
it’s done, it goes through all the child, to give 
them the right space to occupy and the right 
position of course. 

The algorithm works that way: if the current 
box is vertical, then it needs to give its child 
a vertical part functionRect, and a horizontal 
one for a horizontal box of course.The amount 
of vertical or horizontal size of the child is cal¬ 
culated, still depending on the vertical nature 
of the current box. This size is the box current 
height or width, minus the margins, multiplied 
by the horizontal or vertical ratio. Concrete ex¬ 
ample: the current box is a horizontal display, 
and has 2 child components, one having a hori¬ 
zontal ratio of 0.7 and the other one of 0.3. The 
box display size is here 1000x500 pixels, and it’s 
total size 1008x504 (2 items and it’s a horizontal 
box, so 2 * (2 * margin) = 8 on the width, and 
2 * margin = 4 on the height). 

Let’s say the size of the window almost, dou¬ 
bled, and it’s now 2008x1004 (arbitrary simple 
values). It will calculate that the first item get a 
0.7*(2008 —2*4) = 0.7*2000 = 1400 pixels wide 
space and the second one 0.3*(2008—2*4) = 600 
pixels. First item bounds will be 1400x1000 and 
the second one 600x1000, height being kept the 
same, without the margins of course. 

On top of that, to keep track of where to place 
our components, the functionRect get cut off 


little by little every time a uiBaseComponent is 
given a rectangle to be displayed in jH]. Basi¬ 
cally, every rectangle that is given to child is 
removed from the original functionRect, and 
this allow us the keep track of the good x and y 
coordinates to give to the child component, with 
the margin added. It’s done over and over again 
for each child component, cutting from the left 
or the top of the boxRectangle<int> rect de¬ 
pending on its orientation. 





Figure 3: Representation of the layout algo¬ 
rithm 


4.6 The MainContentComponent class 
In the adapted MainContentComponent class, 
there is plenty of FAUST libraries, that are in¬ 
dispensable for the FAUST program. There are 
some optionals includes, for OSC, MIDI and 
polyphonic mode, that depends on the compi¬ 
lation options that the user sets. 

The MainContentComponent class is the 
Juce:: Component contained in our Viewport, 
and contains itself a JuceGUI object, that is a 
subclass of Juce: Component, FAUST GUI class 
and MetaDataUI. The minimal things to do is: 


addAndMakeVisible(juceGUI); 
fDSP = new mydspO; 
fDSP->buildUserInterface(&juceGUI); 
recommendedSize = juceGUI.getSizeO; 
setSize (recommendedSize.getWidthO, 

recommendedSize.getHeight()); 
setAudioChannels (fDSP->getNumInputs(), 
fDSP->getNum0utputs()) 


[...] 


private: 

JuceGUI juceGUI; 

A simple buildUserInterface call is needed, 
set the size of the MainContentComponent, and 
set the amount of audio channels. Following the 
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same spirit, there is optional code in case of a 
MIDI. OSC or polyphonic mode. 

5 Other Faust architecture files 

Just a GUI architecture file isn’t enough to run a 
FAUST program on JUCE, adaptations for dif¬ 
ferent kind of control are also needed, such as 
OSC and MIDI. 

5.1 OSC 

OSC integration has been done by devel¬ 
oping a new JuceOSCUI class, subclass of 
the base UI class. Two send and re¬ 
ceive ports are defined. Input OSC mes¬ 
sages are decoded by subclassing the JUCE 
OSCReceiver class, and implementing its 
OSCReceiver::oscMessageReceived method. 
Output OSC messages are sent by using the 
□SCSender::send method. 

The special "hello" message allows to retrieve 
several parameters of the FAUST applications: 
its root OSC port, IP address, input and output 
port. The "get" message allows to retrieve the 
current, min and max values for a given param¬ 
eter. Finally a float value received on a given 
path will allow to change the parameter value 
in real-time. 

An application wanting to be controlled by 
OSC messages has to use an instance of the 
JuceOSCUI class, to be given to the DSP 
buildUserInterface method. 

5.2 MIDI 

MIDI messages handling is done by us¬ 
ing the Midiinput and MidiOutput 

JUCE classes. A new juce_midi class 
subclassing the MidilnputCallback 

and implementing the required 
MidilnputCallback:ihandlelncomingMidi 
Message method has been defined. MIDI mes¬ 
sages coming from the JUCE layer are decoded 
and sent to the corresponding application 
controllers. MIDI messages produced by the 
application controllers are encoded and sent 
using a MidiOutput object. 

An application wanting to be controlled by 
MIDI messages has to use an instance of the 
MidiUI class, created with a juce_midi handler, 
to be given to the DSP buildUser Interface 
method. 

6 Audio integration 

To be connected to the external world, a given 
FAUST DSP has to be connected to an audio 
driver and a User Interface definition. JUCE 


framework already contains an abstract audio 
layer connected to a set of native audio drivers 
on all development platforms. JUCE develop¬ 
ers can choose to deploy their code as stan¬ 
dalone audio applications or audio plugins. A 
standalone application has to subclass the ab¬ 
stract AudioAppComponent class and implement 
the prepareToPlay, getNextAudioBlock and 
releaseResources methods: 

• prepareToPlay is called just before audio 
processing starts with a sample rate param¬ 
eter. The Faust DSP is initiated with this 
sample rate value, and input/output chan¬ 
nels number is possibly adapted to match 
the capabilities of the used native layer 
(that can a different number of input/out- 
put channels than the DSP). 

• getNextAudioBlock is called every time 
the audio hardware needs a new block of 
audio data. Audio buffers presented as 
a AudioSourceChannellnfo data type are 
retrieved and adapted to be given to the 
Faust DSP compute method. 

• releaseResources is called when audio 
processing has finished. Nothing special 
has to be done at the FAUST level. 

7 The faust2juce script 

There are many scripts availiable in the FAUST 
ecosystem allowing to generate a ready to use 
binary, project file, or compiled file from a sim¬ 
ple DSP file. They are labeled faust2xxx. 

Following the same spirit, a faust2juce 
script has been implemented, that allows to cre¬ 
ate a JUCE project directory from a simple DSP 
file. The command is used as follow: 

faust2juce [-options] dspFile.dsp 

This will create a folder containing a .jucer file, 
and a "Source" folder containing the Main.cpp 
and the MainComponent. h. This folder is self 
contained, all needed Faust includes are in the 
MainComponent. h, including the compiled DSP. 

There are the options available at this mo¬ 
ment for faust2juce: 

• -nvoices x: produces a polyphonic self- 
contained DSP with x voices, ready to be 
used with MIDI events 

• -midi: activates MIDI control 

• -osc: activates OSC control 
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• -help: shows the different options available 


As described in subsection 2.4 a number of 
voices can be hardcoded for a polyphonic DSP, 
but you can change it with the nvoices option. 
It has the priority over the metadata declara¬ 
tion. In the case of a non-hardcoded polyphonic 
DSP, it will just make it a polyphonic one with 
this compiler option. Some others options will 
be added later, it’s still in development. 


[3] JUCE online documentation https ://www. 
juce.com/doc/classes 

[4] JUCE "Tutorial: Advanced Rectangle 

techniques" https://www.juce.com/doc/ 
tutorial_rectangle_advanced 

[5] J. Storer "Developing Graphical User 
Interfaces with JUCE", JUCE Summit 
2015 https://www.youtube.com/watch?v= 
xsCZoEls_uw 


8 How to use JUCE architecture 
files 

Using JUCE to export a FAUST DSP program 
file is easy: create the project folder with 
faust2juce [-options] dspFile.dsp and 
drag &; drop the created folder named after the 
DSP to the "example" folder contained in the 
JUCE git folder. 

Simply execute the .jucer file, and select "Save 
Project and Open in IDE...", the first time at 
least, to generate the JUCE header hies, etc... 
And it’s ready to execute your program on what¬ 
ever export platform you chose. 


9 Conclusion 

The FAUST audio DSP language implementaion 
is now possible with JUCE, and can theoreti¬ 
cally be exported to every platform that JUCE 
supports. It has been tested on OS X and iOS, 
both work correctly, and has a close performance 
to already available options, such as f aust2caqt 
for OS X and faust2ios, for iOS. 

MIDI control, polyphonic mode, and OSC 
control are implemented, more features are in 
progress of development, to permit a full com¬ 
patibility with the whole FAUST library. 

JUCE offers two types of "audio project", 
standalone applications or plug-in. Currently 
the FAUST architecture hies are limited to de¬ 
scribe standalone applications, but we are look¬ 
ing forward to adapt our code for plug-ins. 
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Abstract 

The Faust architecture files ecosystem is regularly 
enriched with new targets to deploy Digital Signal 
Processing (DSP) programs. This paper presents re¬ 
cently developed techniques to expand the standard 
one DSP source, one program or plugin model, and 
to better control parameter changes during the au¬ 
dio computation. Sample accurate control and poly¬ 
phonic instruments definition have been introduced, 
and will be explained particularly in the context of 
MIDI control. 

Keywords 

Faust, DSP programming, audio, MIDI 

1 Introduction 

FAUST is a functional programming language 
specifically designed for real-time signal process¬ 
ing and synthesis. From a high-level specifica¬ 
tion, its compiler typically generates the DSP 
computation as a C++ clasqjto be wrapped by 
so-called architecture files and connected to the 
external world. 

1.1 Audio and UI Architecture Files 

Native audio drivers are developed as subclasses 
of a base audio class, controllers as subclasses of 
a base UI class. Typical Graphical User Inter¬ 
face architectures are based on well established 
frameworks like QT0 or JUCEj^J and allow to 
display a ready to use window with sliders, text 
zones and buttons. Audio and UI parts are fi¬ 
nally combined with the actual DSP computa¬ 
tion to produce the final audio application or 
plugin (see Figure [I]). 

Non graphical controllers can also be defined 
as subclasses of UI, simply by ignoring the lay¬ 
out descriptioi^J and just keeping the actual 

x The faust2 development branch can also generate C, 
LLVM IR, WebAssemby etc. target languages. 

J http://doc.qt.io 
1 https://www.juce.com/doc/classes 
4 Typically done using hgroup, vgroup or tgroup in 
the DSP source code. 
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Figure 1: DSP code is generated by the compiler, audio 
and UI codes are added from the generic architecture 
files. 


controls definition (with their name, default 
value, value range etc.). OSCUI and httpdUI 
classes [I] typically follow this strategy. 

New architecture hies have been regularly 
added to the already rich FAUST ecosystem, to 
expand the variety of possible targets for the 
DSP code. 

1.2 Macro Construction of DSP 
Components 

The Faust program specification is usually en¬ 
tirely done in the language itself. But in some 
specific cases it may be useful to develop sepa¬ 
rated DSP components and combine them in a 
more complex setup. 

Since taking advantage of the huge number 
of already available UI and audio architecture 
hies is important, keeping the same dsp API is 
preferable^ so that more complex DSP can be 
controlled and audio rendered the usual way: 

class dsp { 
public: 

virtual int getNumlnputs() {} 
virtual int getNumOutputs0 {} 
virtual void buildUserlnterface(UI* ui) {} 
virtual void init(int samplingRate) {} 

5 Only part of the complete DSP API is presented 
here. 
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virtual void compute(int count, 
FAUSTFLOAT** inputs, 
FAUSTFLOAT** outputs) O 

>; 

Extended DSP classes will typically subclass 
the dsp root class and override part of its API. 

This paper shows how this approach can be 
used to define new extended and combinable dsp 
classes. Section 2 describes tools to combine sep¬ 
arately developed DSP. Section 3 explains how 
sample accurate parameter control of a given 
DSP can be done using the new timed_dsp class, 
and when it needs to be used. 

Section 4 presents the model used to deploy 
polyphonic instruments, section 5 presents how 
the previously presented components can be 
used together in the context of MIDI control, 
and finally the conclusion tries to enlarge this 
work in a more general analysis of the FAUST 
compiler generated code. 

2 Combining DSP 

2.1 Dsp Decorator Pattern 

A dsp_decorator class, subclass of the root dsp 
class has first been defined. Following the dec¬ 
orator design pattern)^] it allows behavior to be 
added to an individual object, either statically 
or dynamically. 

The extended DSP class hierarchy is shown in 
Figure [2| As an example of the decorator pat¬ 
tern, the timed_dsp class allows to decorate a 
given DSP with sample accurate control capa¬ 
bility as explained in section 3. 




dsp 



- dsp decorator mydsp poly dsp sequencer my 


mydsp compiler generated 

Figure 2: DSP classes diagram 


2.2 Combining DSP Components 

A few additional macro construction classes, 
subclasses of the root dsp class have been de¬ 
fined in the public faust/dsp/dsp-combiner, h 
header hie: 

c https://en.Wikipedia.org/wiki/Decorator_ 
pattern 


• the dsp_sequencer class combines two 
DSP in sequence, assuming that the num¬ 
ber of outputs of the first DSP equals the 
number of input of the second one. Its 
buildUserlnterf ace method is overloaded 
to group the two DSP in a tabgroup, so that 
control parameters of both DSPs can be in¬ 
dividually controllecQ Its compute method 
is overloaded to call each DSP compute 
in sequence, using an intermediate output 
buffer produced by first DSP as the input 
one given to the second DSP. 

• the dsp_parallelizer class com¬ 
bines two DSP in parallel. Its 

getNumlnputs/getNumOutputs meth¬ 
ods are overloaded by correctly reflecting 
the input/output of the resulting DSP as 
the sum of the two combined ones. Its 
buildUserlnterf ace method is overloaded 
to group the two DSP in a tabgroup, so 
that control parameters of both DSP can 
be individually controlled. Its compute 
method is overloaded to call each DSP 
compute, where each DSP consuming and 
producing its own number of input/output 
audio buffers taken from the method 
parameters. 

3 Sample Accurate Control 

DSP audio languages usually deal with several 
timing dimensions when treating control events 
and generating audio samples. For performance 
reasons, systems maintain separated audio rate 
for samples generation and control rate for asyn¬ 
chronous messages handling. 

The audio stream is most often computed by 
blocks, and control is updated between blocks. 
To smooth control parameter changes, some lan¬ 
guage chose to interpolate parameter values jT] 
between blocks. 

In some cases control may be more finely in¬ 
terleaved with audio rendering [8], and some lan¬ 
guages |9] simply choose to interleave control 
and sample computation at sample level. 

Although the FAUST language permits the de¬ 
scription of sample level algorithms (like recur¬ 
sive filters etc.), FAUST generated DSP are usu¬ 
ally computed by blocks. Underlying audio ar¬ 
chitectures usually give a fixed size buffer over 
and over to the DSP compute method which 
consumes and produces audio samples. 

7 Typically using any UI object. 
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3.1 Control to DSP Link 

In the current version of the FAUST generated 
code, the primary connection point between the 
control interface and the DSP code is simply 
a memory zone. For control inputs, the archi¬ 
tecture layer continuously write values in this 
zone, which is then sampled by the DSP code 
at the beginning of the compute method, and 
used with the same values during the entire call. 
Because of this simple control/DSP connexion 
mechanism, the most recent value is seen by the 
DSP code. 

Similarly for control output^ the DSP code 
inside the compute method possibly write sev¬ 
eral values at the same memory zone, and the 
last value only will be seen by the control archi¬ 
tecture layer when the method finishes. 

Although this behaviour is satisfactory for 
most use-cases, some specific usages need to 
handle the complete stream of control values 
with sample accurate timing. For instance keep¬ 
ing all control messages and handling them at 
their exact position in time is critical for proper 
MIDI clock synchronisation. 

3.2 Time-Stamped Control 

The first step consists in extending the archi¬ 
tecture control mechanism to deal with time- 
stamped control events. Note that this requires 
the underlying event control layer to support 
this capability. The native MIDI API for in¬ 
stance is usually able to deliver time-stamped 
MIDI messages. 

The next step is to keep all time-stamped 
events in a time ordered data structure to be 
continuously written by the control side, and 
read by the audio side. 

Finally the sample computation has to take 
account of all queued control events, and cor¬ 
rectly change the DSP control state at succes¬ 
sive points in time. 

3.3 Slices Based DSP Computation 

With time-stamped control messages, changing 
control values at precise sample indexes on the 
audio stream becomes possible. A generic slices 
based DSP rendering strategy has been imple¬ 
mented in the timed_dsp class. 

A ring-buffer is used to transmit the stream 
of time-stamped events from the control layer 
to the DSP one. In the case of MIDI control 
case for instance, the ring-buffer is written with 
a pair containing the time-stamp expressed in 

8 Using bargraph kind of UI elements. 


samples and the actual MIDI message each time 
one is received. In the DSP compute method, 
the ring-buffer will be read to handle all mes¬ 
sages received during the previous audio block. 

Since control values can change several times 
inside the same audio block, the DSP compute 
cannot be called only once with the total num¬ 
ber of frames and the complete inputs/outputs 
audio buffers. The following strategy has to be 
used: 

• several slices are defined with control values 
changing between consecutive slices. 

• all control values having the same time- 
stamp are handled together, and change 
the DSP control internal state. The slice 
is computed up to the next control param¬ 
eters time-stamp until the end of the given 
audio block is reached. 

• in the Figure [3] example, four slices with the 
sequence of cl, c2, c3, c4 frames are succes¬ 
sively given to the DSP compute method, 
with the appropriate part of the audio in¬ 
put/output buffers. Control values (ap¬ 
pearing here as [vl,v2,v3], then [vl,v3], 
then [vl], then [vl,v2,v3] sets) are changed 
between slices. 


Vi 


Vi 




Vi 

v2 




Vi 


v2 

v3 


v3 




v3 

cl 

c2 

c3 



Figure 3: Audio block slice-based computation 

Since time-stamped control messages from the 
previous audio block are used in the current 
block, control messages are aways handled with 
one audio buffer latency. 

4 Polyphonic Instruments 

Directly programing polyphonic instruments in 
Faust is perfectly possible. It is also needed 
if very complex signal interaction between the 
different voices have to be describecjf] 

But since all voices would always be com¬ 
puted, this approach could be too CPU costly 
for simpler or more limited needs. In this case 

9 Like sympathetic strings resonance in a physical 
model of a piano for instance. 
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describing a single voice in a Faust DSP pro¬ 
gram and externally combining several of them 
with a special polyphonic instrument aware ar¬ 
chitecture file is a better solution. Moreover, 
this special architecture file takes care of dy¬ 
namic voice allocations and control MIDI mes¬ 
sages decoding and mapping. 

4.1 Polyphonic Ready DSP Code 

By convention FAUST architecture files with 
polyphonic capabilities expect to find control 
parameters named freq, gain and gate. The 
metadata declare nvoices "8"; kind of line 
with a desired value of voices can be added in 
the source code. 

In the case of MIDI control, the freq parame¬ 
ter (which should be a frequency) will be auto¬ 
matically computed from MIDI note numbers, 
gain (which should be a value between 0 and 1) 
from velocity and gate from key on/key off events. 
Thus, gate can be used as a trigger signal for any 
envelope generator, etc. 

4.2 Using the mydsp poly class 

The single voice has to be described by a FAUST 
DSP program, the mydsp_poly class is then used 
to combine several voices and create a poly¬ 
phonic ready DSP: 


• the faust/dsp/poly-dsp.h file contains the 
definition of the mydsp_poly class used to 
wrap the DSP voice into the polyphonic ar¬ 
chitecture. This class maintains an array of 
dsp type of objects, manage dynamic voice 
allocations, control MIDI messages decod¬ 
ing and mapping, mixing of all running 
voices, and stopping a voice when its out¬ 
put level decreases below a given threshold. 


as a sub-class of DSP, the mydsp_poly 
class redefines the buildUserlnterface 
method. By convention all allocated voices 
are grouped in a global Polyphonic tab- 
group. The first tab contains a Voices 
group, a master like component used to 
change parameters on all voices at the 
same time, with a Panic button to be used 
to stop running voices 10 followed by one 


tab for each voice. Graphical User Inter¬ 
face components will then reflect the multi¬ 
voices structure of the new polyphonic DSP 
(Figure [I]). 


10 An internal control grouping mechanism has been 
defined to automatically dispatch a user interface action 
done on the master component on all linked voices. 



Figure 4: Extended multi-voices GUI interface 


The resulting polyphonic DSP object can be 
used as usual, connected with the needed audio 
driver, and possibly other UI control objects like 
OSCUI, httpdUI, etc. Having this new UI hi¬ 
erarchical view allows complete OSC control of 
each single voice and their control parameters, 
but also all voices using the master component. 

The following OSC messages reflect the same 
DSP code either compiled normally, or in poly¬ 
phonic mode (only part of the OSC hierarchies 
are displayed here): 


// Mono mode 

/0x00/0x00/vol f -10.0 
/0x00/0x00/pan f 0.0 

// Polyphonic mode 

/Polyphonic/Voices/0x00/0x00/pan f 0.0 
/Polyphonic/Voices/0x00/0x00/vol f -10.0 

/Polyphonic/Voicel/0x00/0x00/vol f -10.0 
/Polyphonic/Voicel/0x00/0x00/pan f 0.0 

/Polyphonic/Voice2/0x00/0x00/vol f -10.0 
/Polyphonic/Voice2/0x00/0x00/pan f 0.0 


The polyphonic instrument allocation takes 
the DSP to be used for one voicef^] the desired 
number of voices, the dynamic voice allocation 
state 12 and the group state which controls if 
separated voices are displayed or not (Figure [4]) : 


DSP = new mydsp_poly(dsp, 2, true, true); 

11 The DSP object will be automatically cloned in the 
mydsp_poly class to create all needed voices. 

12 Voices may be always running, or dynamically start¬ 
ed/stopped in case of MIDI control. 
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With the following code, note that a poly¬ 
phonic instrument may be used outside of a 
MIDI control context, so that all voices will 
be always running and possibly controlled with 
OSC messages for instance: 

DSP = new mydsp_poly(dsp, 8, false, true); 

4.3 Controlling the Polyphonic 
Instrument 

The mydsp_poly class is also ready for MIDI 
control and can react to keyon/keyoff and pitch- 
wheel messages. Other MIDI control parameters 
can directly be added in the DSP source code. 

4.4 Deploying the Polyphonic 
Instrument 

Several architecture files and associated scripts 
have been updated to handle polyphonic instru¬ 
ments: 

As an example on OSX, the script 
faust2caqt foo.dsp can be used to cre¬ 
ate a polyphonic CoreAudio/QT application. 
The desired number of voices is either declared 
in a nvoices metadata or changed with the 
-nvoices num additional parameteip*] MIDI 
control is activated using the -midi parameter. 

The number of allocated voices can possibly 
be changed at runtime using the -nvoices pa¬ 
rameter to change the default value (so using 
./foo -nvoices 16 for instance). 

Several other scripts have been adapted using 
the same conventions. 

4.5 Polyphonic Instrument with a 
Global Output Effect 

Polyphonic instruments may be used with an 
output effect. Putting that effect in the main 
Faust code is not a good idea since it would be 
instantiated for each voice which would be very 
inefficient. This is a typical use case for the 
dsp_sequencer class previously presented with 
the polyphonic DSP connected in sequence with 
a unique global effect (Figure [5]). 

faustcaqt inst.dsp -effect effect.dsp 
with inst.dsp and effect.dsp in the same folder, 
and the number of outputs of the instrument 
matching the number of inputs of the effect, has 
to be used. A dsp_sequencer object will be 
created to combine the polyphonic instrument 
in sequence with the single output effect. 

13 -nvoices parameter takes precedence over the meta¬ 
data value. 


Polyphonic ready faust2xx scripts will then 
compile the polyphonic instrument and the ef¬ 
fect, combine them in sequence, and create a 
ready to use DSP. 



Figure 5: Polyphonic instrument with output effect 
GUI interface: left tab window shows the polyphonic 
instrument witli its Voices group only, right tab window 
shows the output effect. 


5 MIDI Control 

MIDI control connects DSP parameters with 
MIDI messages (in both directions), and can be 
used to trigger polyphonic instruments. 

5.1 MIDI Messages Description in the 
DSP Source Code 

MIDI control messages are described as meta¬ 
data in UI elements. They are decoded by a new 
MidiUI class, subclass of UI, which parses in¬ 
coming MIDI messages and updates the appro¬ 
priate control parameters, or sends MIDI mes¬ 
sages when the UI elements (sliders, buttons...) 
are moved. 

5.2 Defined Standard MIDI messages 

A special [midi : xxx yyy. . . ] metadata needs 
to be added in the UI element. Here is the de¬ 
scription of three common MIDI messages: 

• [midi:keyon pitch] in a slider or bar- 
graph will map the UI element value to 
keyon velocity in the (0, 127) range. When 
used with a button or checkbox, 1 will be 
mapped to 127, 0 will be mapped to 0, 

• [midi:keyoff pitch] in a slider or bar- 
graph will map the UI element value to 
keyoff velocity in the (0,127) range. When 
used with a button or checkbox, 1 will be 
mapped to 127, 0 will be mapped to 0, 
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• [midi:ctrl num] in a slider or bar graph 
will map the UI element value to (or from) 
(0, 127) range. When used with a button 
or checkbox, 1 will be mapped to 127, 0 will 
be mapped to 0. 

The full description of supported MIDI mes¬ 
sages is now part of the FAUST documentation. 

5.3 MIDI Clock Synchronization 

MIDI clock based synchronization can be used 
to slave a given FAUST program, using the sam¬ 
ple accurate control mechanism described in sec¬ 
tion 3. The following three messages have to be 
used: 

• [midi: start] in a button or checkbox will 
trigger a value of 1 when a start MIDI mes¬ 
sage is received 

• [midi: stop] in a button or checkbox will 
trigger a value of 0 when a stop MIDI mes¬ 
sage is received 

• [midi: clock] in a button or checkbox will 
deliver a sequence of successive 1 and 0 val¬ 
ues each time a clock MIDI message is re¬ 
ceived, seen by FAUST code as a square 
command signal, to be used to compute 
higher level information. 

A typical Faust program will then use the 
MIDI clock command signal to possibly com¬ 
pute the Beat Per Minutes (BPM) information, 
or for any synchronization need it may have. 

Here is a simple example of a sinusoid gener¬ 
ated which a frequency controlled by the MIDI 
clock strearrp*] and starting/stopping when re¬ 
ceiving the MIDI start/stop messages: 

import("stdfaust.lib"); 

// square signal (1/0), changing state 
// at each received clock 

docker = checkbox("MIDI clock [midi : clock] ") ; 

// ON/OFF button controlled 
// with MIDI start/stop messages 

play = checkbox("0n/0ff [midi:start][midi:stop]"); 

// detect front 

front(x) = (x-x 1 ) != 0.0; 

// count number of peaks during one second 
freq(x) = (x-x@ma.SR) : + ~ 

process = os . osc (8*freq(front (docker) ) ) * play; 

14 Using an external MIDI clock generator and chang¬ 
ing its tempo allow to precisely control the sinusoid fre¬ 
quency. 


Note that the described sample accurate 
MIDI clock synchronization model can currently 
only be used at input level. Because of the 
simple memory zone based connection point be¬ 
tween the control interface and the DSP code, 
output controls (like bargraph) cannot generate 
a stream of control values. Thus a reliable MIDI 
clock generator cannot be implemented with the 
current approach. 

5.4 MIDI Classes 

A midi base class defining MIDI messages de¬ 
coding/encoding methods has been developed. 
A midi_hander subclass implements actual de¬ 
coding. Several concrete implementations based 
on native API have been written (Figure [6]) and 
can be found in the faust/midi folder. 

Depending on the used native MIDI API, 
event time-stamps are either expressed in ab¬ 
solute time or in frames. They are converted 
to offsets expressed in samples relative to the 
beginning of the audio buffer. 

Connected with the new MidiUI class, sub¬ 
class of UI, they allow a given DSP to be con¬ 
trolled with incoming MIDI messages or possi¬ 
bly send MIDI messages when its internal con¬ 
trol state changes. 

rrhdi 

midi_handler 

A 


jackmidihandler | rt midi bela midi juce midi 

Figure 6: MIDI classes diagram 

In the following piece of code, a MidiUI ob¬ 
ject is created and connected to a rt_midi |5 
MIDI message handler, then given as parameter 
to the standard buildUserInterface to control 
the DSP parameters: 

rt_midi midi_handler("MIDI"); 

MidiUI midiinterface(&midi_handler); 
DSP->buildUserInterface(&midiinterface); 

6 Deployment 

The extended architecture files have been pre¬ 
sented and used in the context of statically gen¬ 
erated and compiled DSP, that is generating 
C++ code from FAUST, then compiling the re¬ 
sulting code in executable applications or plug¬ 
ins. They have been deployed in several faust2xx 
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scripts and especially in faust2api presented in 

11 - 

Note that they can also be used with dynam¬ 
ically libfaust generated DSF0 as in particular 
in FaustLive 13] standalone just-in-time FAUST 
compiler, or in faustgen~ Max/MSP external 
object. 

7 Conclusion 

The sample accurate control model could easily 
be adapted to work with MIDI controllable plu¬ 
gins like LV2 instruments 16 so that MIDI clock 
synchronization could be used. 

Expanding the polyphonic and sample accu¬ 
rate control model over the network in the lib- 
faustremote |4| library is still in progress. 

As a general concluding remark, a deeper re¬ 
thinking of the control/DSP connection model 
in the FAUST compiled code will have to be 
done. As explained in section 3, control and 
DSP computation interaction is somewhat lim¬ 
ited in the current model of the generated code. 

The described solution stays at the architec¬ 
ture layer level with some limitations. Although 
sample accurate control for inputs can be done 
using the presented slices based DSP computa¬ 
tion, this strategy does not help to properly re¬ 
trieve the stream of control output values. 

A cleaner approach would be to extend the 
model of control signals to be a list of time- 
stamped values, so that the compute would han¬ 
dle a slice of time-stamped input controls (kept 
from the previous block), and possibly produces 
a slice of time-stamped output controls. Having 
this more general strategy at the code genera¬ 
tion level still has to be developed. 
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Abstract 

We introduce faust2api, a tool to generate cus¬ 
tom DSP engines for Android and iOS using the 
Faust programming language. Faust DSP objects 
can easily be turned into MIDI-controllable poly¬ 
phonic synthesizers or audio effects with built-in sen¬ 
sors support, etc. The various elements of the DSP 
engine can be accessed through a high-level API, 
made uniform across platforms and languages. 

This paper provides technical details on the im¬ 
plementation of this system as well as an evaluation 
of its various features. 

Keywords 

Faust, iOS, Android, Mobile Instruments 

1 Introduction 

Mobile devices (smart-phones, tablets, etc.) 
have been used as musical instruments for the 
past ten years, both in the industry (e.g., 
GarageBand 1 for iPad, Smule’s apps, 2 mo- 
Forte’s GeoShred, 3 etc.), and in the academic 
community ([Tanaka, 2004], [Geiger, 2006], 
[Gaye et ah, 2006], [Essl and Rohs, 2009] and 
[Wang, 2014]). 

Implementing real-time Digital Signal Pro¬ 
cessing (DSP) engines from scratch on mobile 
platforms can be hard using standard audio 
APIs provided with common operating systems 
(we’ll only cover iOS and Android here). In¬ 
deed, CoreAudio on iOS and OpenSL ES on 
Android are relatively low-level APIs offering 
customization possibilities not needed by most 
audio app developers. Fortunately, there ex¬ 
ist several third party cross-platform APIs to 
work with real-time audio on mobile devices at a 
higher level (e.g., SuperPowered, JUCE, 5 etc.). 
Additionally, several open-source tools allow to 

x http://www.apple.com/ios/garageband. All 
the URLs in this paper were verified on 01/26/17. 
2 https://www.smule.com 
3 http://www.moforte.com/geoshredapp 
4 http://superpowered.com 
5 https://www.juce.com 
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use objects written in common computer music 
languages such as PureData: 6 libpd [Brinkmann 
et ah, 2011] and Csound:' Mobile Csound Plat¬ 
form ( MCP ) [Lazzarini et ah, 2012] on mobile 
platforms. 

Similarly, we introduced faust2android 
in a previous publication [Michon, 2013]: a 
tool allowing to turn Faust 8 [Orlarey et ah, 
2009] code into a fully operational Android 
application. faust2android is based on 
faust2api [Michon et ah, 2015]. It al¬ 
lows to turn a Faust program into a cross- 
platform API usable on Android and iOS to 
carry out various kinds of real-time audio pro¬ 
cessing tasks. 

In this paper, we present a completely re¬ 
designed version of faust2api offering the 
same features on Android and iOS: 

• polyphony and MIDI support, 

• audio effects chains, 

• built-in sensors support, 

• low latency audio, 

• etc. 

First, we’ll give an overview of how 
faust2api works. Then, technical details on 
the implementation of this system will be pro¬ 
vided. Finally, we’ll evaluate it and present fu¬ 
ture directions for this project. 

2 Overview 
2.1 Basics 

At its highest level, faust2api is a command 
line program taking a Faust code as its main 
argument and generating a package containing 
a series of files implementing the DSP engine. 
Various flags can be used to customize the API. 
The only required flag is the target platform: 

6 https://puredata.info 
7 http://www.csounds.com 
8 http://faust.grame.fr 
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faust2api -ios myCode.dsp 

will generate a DSP engine for iOS and 

faust2api -android myCode.dsp 

will generate a DSP engine for Android. 

The content of each package is quite different 
between these two platforms (see §3), but the 
format of the API itself remains very similar 
(see Figure 1 at page 4). The iOS DSP engines 
generated with faust2api consist of a large 
C++ object (DspFaust) accessible through a 
separate header hie. This object can be con¬ 
veniently instantiated and used in any C++ or 
Objective-C code in an app project. A typi¬ 
cal “life cycle” for a DspFaust object can be 

DspFaust *dspFaust = new DspFaust(SR, 
blockSize); dspFaust->start() ; 

dspFaust->stop(); delete dspFaust; 

start () launches the computation of the 
audio blocks and stop() stops (pauses) the 
computation. These two methods can be re¬ 
peated as many times as needed. The construc¬ 
tor allows to specify the sampling rate and the 
block size, and is used to instantiate the au¬ 
dio engine. While the configuration of the au¬ 
dio engine is very limited at the API level (only 
these two parameters can be configured through 
it), lots of flexibility is given to the program¬ 
mer within the Faust code. For example, if 
the Faust object doesn’t have any input, then 
no audio input will be instantiated in the audio 
engine, etc. 

The value of the different parameters of a 
Faust object can be easily modified once the 
DspFaust object is created and is running. 
For example, the freq parameter of the sim¬ 
ple Faust code 

f = nentry("freq", 440,50, 1000, 0.01) ; 
process = osc(f); 

can be modified simply by calling 

dspFaust->setParamValue("freq", 440) ; 

Faust user-interface elements (nentry here) 
are ignored by faust2api and simply used as 
a way to declare parameters controllable in the 
API. API packages generated by faust2api 
also contain a markdown documentation pro¬ 
viding information on how to use the API as 
well a list of all the parameters controllable with 
setParamValue() . 

The structure of the DSP engine package 
is quite different for Android since it contains 
both C++ and JAVA files (see §3). Otherwise, 
the same steps can be used to work with the 
DspFaust object. 


2.2 MIDI Support 

MIDI support can be easily added to a 
DspFaust object simply by providing the 
-midi flag when calling faust2api. MIDI 
support works the same way on Android and 
iOS: all MIDI devices connected to the mobile 
device before the app is launched can control the 
Faust object, and any new device connected 
while the app is running will also be able to 
control it. 

Standard Faust MIDI meta-data 9 can be 
used to assign MIDI CCs to specific parame¬ 
ters. For example, the freq parameter of the 
previous code could be controlled by MIDI CC 
52 simply by writing 

f = nentry ("freq[midi: Ctrl 
52]", 440, 50, 1000, 0.01); 

2.3 Polyphony 

Faust objects can be conveniently turned into 
polyphonic synthesizers simply by specifying 
the maximum number of voices of polyphony 
when calling faust2api using the -nvoices 
flag. In practice, only active voices are allocated 
and computed, so this number is just used as a 
safeguard. 

As used for many years by the various 
tools for making Faust synthesizers, such as 
faust2pd, compatibility with the -nvoices 
option requires the freq, gain and gate pa¬ 
rameters to be defined. faust2api automati¬ 
cally takes care of converting MIDI note num¬ 
bers to frequency values in Hz for freq, MIDI 
velocity to linear amplitude-gain for gain, and 
note-on (1) and note-off (0) for gate: 

f = nentry("freq", 440, 50, 1000, 0.01); g 
= nentry("gain",1,0,1,0.01); 

t = button ( "gate"); process = osc(f)*g* 
t; 

Here, t could be used to trigger an envelope 
generator, for example. In such a case, the voice 
would stop being computed only after t is set 
to 0 and the tail-off amplitude becomes smaller 
than -60dB (configurable using macros in the 
application code). 

A wide range of methods is accessible to work 
with voices. A “typical” life cycle for a AUDI 
note can be 

long voiceAddress = dspFaust->keyOn( 
note,velocity); 

dspFaust->setVoiceParamValue("param", 
voiceAddress,paramValue) ; 

9 http://faust.grame.fr/images/ 
faust-quick-reference.pdf 
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dspFaust->keyOff(note) ; 

setVoiceParamValue () can be used to 
change the value of a parameter for a specific 
voice. 

Alternatively, voices can be allocated without 
specifying a note number and a velocity: 

long voiceAddress = dspFaust->newVoice 
0 ; 

dspFaust->setVoiceParamValue("param", 
voiceAddress,paramValue); 
dspFaust->deleteVoice(voiceAddress); 

For example, this can be very convenient to 
associate voices to specific fingers on a touch¬ 
screen. 

When MIDI support is enabled in 
faust2api, MIDI events will automati¬ 
cally interact with voices. Thus, if a MIDI 
keyboard is connected to the mobile device, 
it will be able to control the Faust object 
without additional configuration steps. 

2.4 Adding Audio Effects 

In most cases, effects don’t need to be re¬ 
implemented for each voice of polyphony and 
can be placed at the end of the DSP chain. 
faust2api allows to provide a Faust object 
implementing the effects chain to be connected 
to the output of the polyphonic synthesizer. 
This can be done simply by giving the -effect 
flag followed by a Faust effects chain hie name 
(e.g., effect.dsp) when calling faust2api: 

faust2api -android -nvoices 12 -effect 
effect.dsp synth.dsp 

The parameters of the effect automatically 
become available in the DspFaust object and 
can be controlled using the setParamValue () 
method. 

2.5 Working With Sensors 

The built-in accelerometer and gyroscope of a 
mobile device can be easily assigned to any of 
the parameters of a Faust object using the acc 
or gyr meta-data: 

g = nentry("gain[acc: 0 0 -10 0 
10] ", 1, 0, 1, 0.01) ; 

Complex mappings can be implemented using 
this system. This feature is not documented 
here, but more information about it is available 
in [Michon, 2017]. This reference also provides 
a series of tutorials on how to use faust2api. 

3 Implementation 

faust2api takes advantage of the modularity 
on the Faust architecture system to generate 


its custom DSP engines. [Letz et ah, 2017] For 
example, turning a monophonic Faust synthe¬ 
sizer into a polyphonic one can be done in a 
simple generic way. Both on Android and iOS, 
faust2api generates a large C++ hie imple¬ 
menting all the features used by the high level 
API. On iOS, this API is accessed through a 
C++ header hie that can be conveniently in¬ 
cluded in any C + + or Ob jective-C code. On 
Android, a JAVA interface allows to interact 
with the native (C++) block. The DSP C++ 
code is the same for all platforms (see Figure 2 
at page 5) and is wrapped into an object imple¬ 
menting the polyphonic synthesizer followed by 
the effects chain (assuming that the -mvoices 
and -poly 2 options were used during compila¬ 
tion). 

In this section, we provide more information 
on the architecture of DSP engines generated 
by faust2api for Android and iOS. 

3.1 iOS 

The global architecture of API packages gen¬ 
erated by faust2api is relatively simple on 
iOS since C++ code can be used directly in 
Ob jective-C (which is one of the two lan¬ 
guages used to make iOS applications along 
with swift). The Faust synthesizer object 
gets automatically connected to the audio en¬ 
gine implemented using CoreAudio. As ex¬ 
plained in the previous section, the sampling 
rate and the buffer length are defined by the 
programmer when the DspFaust object is cre¬ 
ated. The number of instantiated inputs and 
outputs is determined by the Faust code. By 
default, the system deactivates gain correction 
on the input but this can be changed using a 
macro in the including source code. 

MIDI support is implemented using RtMidi 
[Scavone and Cook, 2005], which is auto¬ 
matically added to the API if the -midi 
option was used for compilation. Alterna¬ 
tively, programmers might choose to use the 
propagateMidi () method to send raw AUDI 
events to the DspFaust object in case they 
would like to implement their own AUDI re¬ 
ceiver. 

The same approach can be used for built- 
in sensors using the propagateAcc () and 
propagateGyr () methods. 

3.2 Android 

Android applications are primarily written in 
JAVA. However, despite the fact that the Faust 
compiler can generate JAVA code, it is not a 
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Basic Elements 

DspFaust: Constructor 
'DspFaust: Destructor 
start: Start audio processing 
stop: Stop audio processing 
isRunning: True if processing is on 
get JSONUI: Get UI JSON description 
get JSONMeta: Get Metadata JSON 

Polyphony 

keyOn: Start a new note 
keyOff: Stop a note 
newVoice: Start a new voice 
deleteVoice: Delete a voice 
allNotesOff: Terminate all active voices 
setVoiceParamValue: Set param 
value for a specific voice 
getVoiceParamValue: Get param 
value for a specific voice 


Parameters Control 

getParamsCount: Get number of params 
setParamValue: Set param value 
getParamValue: Get param value 
getParamAddress: Get param address 
getParamMin: Get param min value 
getParamMax: Get param max value 
getParamlnit: Get param init value 
getParamTooltip: Get param description 

Other Functions 

propagateMidi: Propagate raw MIDI 
messages 

propagateAcc: Propagate raw accel data 
setAccConverter: Set accel mapping 
propagateGyr: Propagate raw gyro data 
setGyrConverter: Set gyro mapping 
getCPULoad: Get CPU load 


Figure 1: Overview of the API functions. 


good choice for real-time audio signal processing 
[Michon, 2013]. Thus, DSP packages generated 
by faust2api contain elements implemented 
both in JAVA and C+ + . 

The native portion of the package (C++) im¬ 
plements the DSP elements as well as the au¬ 
dio engine (see Figure 2) which is based on 
OpenSL ES. 10 The audio engine is configured 
to have the same behavior as on iOS. Native 
elements are wrapped into a shared library ac¬ 
cessible in JAVA through a Java Native Inter¬ 
face (JNI) using the Android Native Develop¬ 
ment Kit (NDK). 11 

MIDI receivers can only be created in JAVA 
on Android (and only since Android API 23), 
thus MIDI support is implemented in the JAVA 
portion. Like on iOS, the propagateMidi () 
method can be used to implement custom MIDI 
receivers. 

While raw sensor data can be retrieved in C++ 
on Android, we decided to implement a system 
similar to the one used for MIDI, where raw 
sensor data are pushed from the JAVA layer to 
the native one. 


10 https://www.khronos.org/opensles 
nttps://developer.android.com/ndk/ 
index.html 


4 Evaluation 

4.1 Use in Other Frameworks 

faust2api is now used at the core of 
f aust2android [Michon, 2013] and 
faust2ios. It is also used as the basis 
for our new SmartKeyboard 12 tool (currently 
under development), allowing to generate mu¬ 
sical applications with advanced user interfaces 
on Android and iOS. Figure 3 presents Nuance , 
[Michon et al., 2016] a musical instrument 
based on faust2api and SmartKeyboard. 

4.2 Audio Latency 

We measured the “touch-to-sound” and the 
“round-trip” audio latency of apps based on 
faust2api for various devices using the tech¬ 
niques described by Google on their website. 13 
The “touch-to-sound” latency is the time it 
takes to generate a sound after a touch event 
was registered on the touch screen of the de¬ 
vice. The “round-trip” latency is the time it 
takes to process an analog signal recorded by 
the built-in microphone or acquired by the line 
input. 

Latency performance hasn’t improved on iOS 
(see Table 1) compared to our previous study 
[Michon et ah, 2015], except for newer devices 

12 https://ccrma.Stanford.edu/-rmichon/ 
SmartKeyboard 

13 https://source.android.com/devices/ 
audio/latency_measurements.html 
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Android API 



Figure 2: Overview of DSP engines generated with faust2api. 



Figure 3: Nuance: a musical instrument using 
faust2api. 

such as the iPad Pro. On the other hand, An¬ 
droid made huge progress (see Table 2), thanks 
to tremendous work carried out by Google , as 
well as our completely rewritten audio engine. 

Table 2 shows that a “reasonable” latency 
can only be achieved with the latest version 
of Android, which confirms the measurements 
made by Google. 14 Unfortunately, such per¬ 
formances can only be attained on a few de¬ 
vices supported by Google , and configured with 
a specific sampling rate and buffer length. 

5 Future Directions 

We believe that faust2api has reached a ma¬ 
ture and stable state. However, many elements 

14 https://source.android.com/devices/ 
audio/latency_measurements.html\ 
♦measurements 


Touch to Round 
Device Sound Trip 


iPhoneb 

30 

ms 

13 

ms 

iPhone5 

36 

ms 

13 

ms 

iPodTouch 

36 

ms 

13 

ms 

iPadPro 

28 

ms 

12 

ms 

iPadAir2 

35 

ms 

13 

ms 

iPad2 

45 

ms 

15 

ms 


Table 1: Audio latency for different iOS devices 
using faust2ap±. 


Device 

Touch to 
Sound 

Round 

Trip 

OS 

HTC Nexus 9 

29 ms 

15 ms 

7.0 

Huawei Nexus 6p 

31 ms 

17 ms 

7.0 

Asus Nexus 7 

37 ms 

48 ms 

7.0 

Samsung Gal. S5 

37 ms 

48 ms 

5.0 


Table 2: Audio latency for different Android 
devices using faust2api. 


can be improved: 

First, while basic MIDI support is provided, 
we haven’t tested it with complex MIDI inter¬ 
faces such as the one using the Multidimen¬ 
sional Polyphonic Expression (MPE) standard 
(e.g. LinnStrument, 15 ROLI Seaboard, 16 etc.). 

15 http://www.rogerlinndesign.com/ 
linnstrument.html 

16 https://roli.com/products/ 
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Currently, specific parameters of the various 
elements of the API (such as audio engine, MIDI 
behavior, etc.) can only be configured using 
source-code macros. We would like to provide 
a more systematic and in some cases dynamic 
way of controlling them. 

Finally, we plan to add more targets to 
faust2api for various kinds of platforms to 
help design elements such as audio plug-ins, 
standalone applications, and embedded sys¬ 
tems. 

6 Conclusions 

Faust gives access to dozens of high qual¬ 
ity open source sound processors and genera¬ 
tors ranging from specialized types of filters, 
to virtual analog oscillators, etc. Thanks to 
faust2api, all these elements can be easily 
embedded and controlled in any Android or iOS 
app in a very simple manner. 

One of the new experimental features of the 
Faust compiler allows to select at run time the 
portions of a Faust object that are computed. 
This makes it possible to create very large ob¬ 
jects embedding multiple synthesizers and ef¬ 
fects. We believe that this feature, in combi¬ 
nation with faust2api, will allow to design 
complex FAUST-based DSP engines for a wide 
range of platforms. 
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Abstract 

We present a completely re-organized set of signal 
processing libraries for the Faust programming lan¬ 
guage. They aim at providing a clearer classification 
of the different Faust DSP functions, as well as bet¬ 
ter documentation. After giving an overview of this 
new system, we provide technical details about its 
implementation. Finally, we evaluate it and give 
ideas for future directions. 

Keywords 

Faust, Digital Signal Processing, Computer Music 
Programming Language 

1 Introduction 

Faust is a functional programming language for 
real time Digital Signal Processing (DSP) tar¬ 
geting high-performance audio applications and 
plug-ins for a wide range of platforms and stan¬ 
dards. [Orlarey et ah, 2009] 

One of Faust’s strength lies in its DSP li¬ 
braries implementing a large collection of ref¬ 
erence implementations ranging from filters to 
audio effects and sound generators, etc. 

When Faust was created, it had a lim¬ 
ited number of DSP libraries that were 
organized in a “somewhat” coherent way: 
math, lib contained mathematical functions, 
and music, lib everything else (filters, ef¬ 
fects, generators, etc.). Later, the li¬ 
braries filter.lib, oscillator.lib, and 
effect, lib were developed [Smith, 2008], 
[Smith, 2012], which had significant overlap in 
scope with music . lib. 

A year ago, we decided to fully reorganize the 
Faust libraries to 

• provide more clarity, 

• organize functions by category, 

• standardize function names, 

• create a dynamic documentation of their 
content. 


Yann Orlarey 

GRAME 

Centre National de Creation Musicale 
11 Cours de Verdun (Gensoul) 

69002, Lyon 
France 

or larey @grame. fr 

In this paper, we give an overview of the or¬ 
ganization of the new Faust libraries, as well 
as technical details about their implementation. 
We then evaluate them through the results of a 
workshop on Faust that was taught at the Cen¬ 
ter for Computer Research in Music and Acous¬ 
tics (CCRMA) at Stanford University in 2016, 
and we provide ideas for future directions. 

2 Global Organization and 
Standards 

2.1 Overview 

The new Faust libraries 1 are organized in dif¬ 
ferent files presented in Figure 1. Each file 
contains several subcategories allowing to eas¬ 
ily find functions for specific uses. While some 
libraries host fewer functions than others, they 
were created to be easily updated with new ele¬ 
ments. The content of the old (and now depre¬ 
cated) Faust libraries was spread across these 
new files, making backward compatibility a bit 
hard to implement (see §2.4). 

More specifically, the old music, lib was 
removed since it contained much overlap in 
scope with oscillator . lib, effect, lib, 
and filter.lib. 

effect.lib was divided into several 
“specialized” libraries: compressors . lib, 
misceffects.lib, phaflangers.lib, 
reverbs.lib, and vaeffects.lib. Sim¬ 
ilarly, the content of oscillator. lib 
is now spread between noises.lib and 
oscillators . lib. Finally, demo . lib hosts 
demo functions, typically adding user-interface 
elements with illustrative parameter defaults. 

2.2 Prefixes 

Each Faust library has a recommended 
two-letter namespace prefix defined in the 
“meta library” stdfaust. lib. For example, 
stdf aust. lib contains the lines 

1 http://faust.grame.fr/library.html. All 
the URLs in this paper were verified on 01/30/17. 
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analyzer.lib 

- Amplitude Tracking 

- Spectrum-Analyzers 

- Mth- Octave Spectral Level 

- Arbitrary-Crossover Filter 

- Banks and Spectrum Analyzers 

basics.lib 

- Conversion Tools 

- Counters and Time/Tempo Tools 

- Array Processing and Pattern Matching 

- Selectors (Conditions) 

- Other Misc Functions 


maths.lib 

- Constants 

- Functions 

misceffects.lib 

- Dynamic 

- Filtering 

- Time Based 

- Pitch Shifting 

- Meshes 

noises.lib 

Noise generators library. 


compressors.lib 

Compressors and limiters library. 

delays.lib 

- Basic Delay Functions 

- Lagrange Interpolation 

- Thiran Allpass Interpolation 

demos.lib 

- Analyzers 

- Filters 

- Effects 

- Generators 

envelopes.lib 

Envelope generators library. 

filters.lib 

- Basic Filters 

- Comb Filters 

- Direct-Form Sections 

- Direct-Form Second-Order 

- Biquad Sections 

- Ladder/Lattice 

- Virtual Analog Filters 

- Simple Resonator 

- Butterworth Filters 

- Elliptic (Cauer) Filters 

- Filters for Parametric Equalizers 
(Shelf, Peaking) 

- Arbitrary-Crossover Filter-Banks 


oscillators.lib 

- Wave-Table-Based Oscillators 

- LFOs 

- Low Frequency Sawtooths 

- Bandlimited Sawtooth 

- Bandlimited Pulse, Square, 
and Impulse Trains 

- Filter-Based Oscillators 

- Waveguide-Resonators 

phaflangers.lib 

Phasers and hangers library 

reverbs.lib 

Reverbs library. 

routes.lib 

Signal routing library. 

signals.lib 

Misc signal tools library. 

spats.lib 

Spatialization tools library. 

synths.lib 

Misc synthesizers library. 

vaeffects.lib 

Virtual analog effects library. 


Figure 1: Overview of the organization of the new Faust libraries. 


fi = library("filters.lib"); 
os = library ("oscillators.lib"); 

so that functions from oscillator.lib 
can be invoked using the os prefix and func¬ 
tions from filter, lib through fi: 


import("stdfaust.lib"); 

process = os.sawtooth(440) : fi.lowpass 

( 2 , 2000 ) ; 

It is of course possible to avoid prefixes using 
the import directive: 
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import("filters.lib"); 

import("oscillators.lib" ) ; 

process = sawtooth (440) : lowpass 

(2,2000) ; 

The libraries presently avoid name collisions, 
so it is possible to load all functions from all 
libraries into one giant namespace soup: 

import("all.lib") ; 

process = sawtooth(440) : lowpass 

(2,2000); 

Alternatively, all FAUST-defined functions 
can be loaded into a single namespace separate 
from the user’s namespace: 

sf = library ("all.lib"); // standard 
faust namespace 

process = sf.sawtooth(440) : sf.lowpass 

(2,2000) ; 

Further details can be found in the documen¬ 
tation for the libraries. 2 

2.3 Standard Functions 

The Faust libraries implement dozens of func¬ 
tions, and it can be hard for new users to find 
standard elements for basic uses. For example, 
filter, lib contains seven different lowpass 
filters, and it’s probably not obvious to some¬ 
one with little experience in signal processing 
which one should be used. 

To address this problem, the new Faust li¬ 
braries declare “standard” functions (see Fig¬ 
ure 2) that are automatically added to the li¬ 
brary documentation. 3 Standard functions are 
organized by categories, independently from 
the library where they are declared (see §3). 
They should cover the needs of most users used 
to computer music programming environments 
such as PureData, 4 SuperCollider, 5 etc. 

2.4 Backward Compatibility 

With such major changes, providing a decent 
level of backward compatibility proved to be 
quite complicated. The old Faust libraries 
(effect.lib, filter.lib, math.lib, 
music, lib and oscillator . lib) can still 
be used and will remain accessible for about 
one year. 

In order to make this possible, we had to find 
a way to make them cohabit with the new li¬ 
braries without creating conflicts. Thus, we de¬ 
cided to use plurals for the name of the new 

2 http://faust.grame.fr/library.html 

3 http://faust.grame.fr/library.html\ 

#standard-functions. 

4 https://puredata.info. 

5 http://supercollider.github.io. 


libraries, allowing to concurrently use our new 
filters, lib with the old filter, lib, for 
example. 

If one of the old libraries is imported in 
a Faust program, the Faust compiler now 
throws a warning indicating the use of a dep¬ 
recated library. 

2.5 Other “Non-Standard” Libraries 

A few “non-standard” libraries for very specific 
applications remain accessible but are not doc¬ 
umented (see §3): 

• hoa . lib: high order ambisonics library 

• instruments . lib: library used by the 
Faust-STK [Michon and Smith, 2011] 

• maxmsp.lib: compatibility library for 
Max/MSP 

• tonestacks . lib: tonestack emulation 
library used by Guitarix 6 

• tubes, lib: guitar tube emulation library 
used by Guitarix 

3 Automatic Documentation 

The new Faust libraries use a new automatic 
documentation system based on the faust2md 
(Faust to MarkDown) script which is now part 
of the Faust distribution. It allows to eas¬ 
ily write MarkDown comments within the code 
of the libraries by respecting the standards de¬ 
scribed below. 

Library headers and descriptions can be cre¬ 
ated with 

//##### Library Name ##### // Some 
Markdown text. 
//######################## 

Libraries can be organized into sections using 
the following syntax: 

//===== Section Name ===== // Some 
Markdown text. 

// ======================== 

Each function in a library should be docu¬ 
mented as such: 

// -Function Name-// Some 

Markdown text. 

// - 

The libraries documentation can be conve¬ 
niently generated by running: 

make doclib 

6 http://guitarix.org. 
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Analysis Tools 


Envelopes 


an.amp.follower 

Amplitude follower 

en.adsr 

ADSR envelope 

an. mth_oct [ . . . ] 

Octave analyzers 

en. ar 

AR envelope 



en.asr 

ASR envelope 

Basic Elements 


en.smoothEnv 

Exponential envelope 

ba.beat 

Pulse generator 



si.block 

Block a signal 

Filters 


ba.bpf 

Break Point Function 

fi.bandpass 

Bandpass (Butterworth) 

si.bus 

Bus of n signals 

fi . resonbp 

Bandpass (resonant) 

ba.bypassl 

Bypass (mono) 

fi.bandstop 

Bandstop (Butterworth) 

ba.bypass2 

Bypass (stereo) 

f i .tf2 

Biquad Filters 

ba.count 

Counts in a list 

fi.allpass.fcomb 

Comb (allpass) 

ba.countdown 

Samples count down 

fi.fb_fcomb 

Comb (feedback) 

ba.countup 

Samples count up 

fi.ff_fcomb 

Comb (feedforward) 

de.delay 

Integer delay 

fi. dcblocker 

DC blocker 

de.fdelay 

Fractional delay 

fi . filterbank 

Filterbank 

ba.impulsify 

Signal to impulse 

fi.fir 

FIR (arbitrary order) 

ba.sAndH 

Sample and hold 

f i. high.shelf 

High shelf 

ro.cross 

Cross n signals 

fi.highpass 

Highpass (Butterworth) 

si.smoo 

Smoothing 

fi . resonhp 

Highpass (resonant) 

si.smooth 

Controllable smoothing 

f i . iir 

IIR (arbitrary order) 

ba.take 

Element from a list 

fi . levelfilter 

Level filter 

ba.time 

Timer 

f i. low.shelf 

Low shelf 



fi.lowpass 

Lowpass (Butterworth) 

Conversion 


fi . resonlp 

Lowpass (resonant) 

ba.db21inear 

dB to linear 

fi.notchw 

Notch filter 

ba.Iinear2db 

Linear to dB 

f i . peak_eq 

Peak equalizer 

ba.midikey2hz 

MIDI key to Hz 



ba.pole2tau 

Pole to t60 

Generators 


ba.samp2sec 

Samples to seconds 

os.impulse 

Impulse 

ba.sec2samp 

Seconds to samples 

os.imptrain 

Impulse train 

ba.tau2pole 

t60 to pole 

os.phasor 

Phasor 



no.pink.noise 

Pink noise 

Effects 


os.pulsetrain 

Pulse train 

ve.autowah 

Auto-wah 

os . lf.imptrain 

Low-freq pulse train 

co.compressor 

Compressor 

os.sawtooth 

Sawtooth wave 

ef.cubicnl 

Distortion 

os. If _saw 

Low-freq sawtooth 

ve.crybaby 

Crybaby 

os.osc 

Sine (filter-based) 

ef.echo 

Echo 

os.oscsin 

Sine (table-based) 

pf.flanger 

Flanger 

os.square 

square wave 

ef. gateunono 

Signal gate 

os. lf_square 

Low-freq square 

co.limiter 

Limiter 

os.triangle 

Triangle 

pf.phaser2 

Phaser 

os . lfrtriangle 

Low-freq triangle 

re.fdnrevO 

Reverb (FDN) 

no.noise 

White noise 

re.freeverb 

Reverb (Freeverb) 



re.jcrev 

Reverb (simple) 

Synths 


re.zita_revl 

Reverb (Zita) 

sy.additiveDrum 

Additive drum 

sp.panner 

Panner 

sy.dubDub 

Filtered sawtooth 

ef.transpose 

Pitch shift 

sy.combString 

Comb string 

sp.spat 

Panner 

sy. fm 

FM 

ef.speakerbp 

Speaker simulator 

sy.sawTrombone 

Lowpassed sawtooth 

ef. stereo.width 

Stereo width 

sy.popFiltPerc 

Popping filter 

ve.vocoder 

Vocoder 



ve.wah4 

Wah 




Figure 2: Standard Faust functions with their corresponding prefix when used with 
stdfaust.lib. 
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at the root of the Faust distribution. This 
will generate an html and a pdf file in the 
/documentation folder using pandoc. 7 

4 Evaluation and Future Directions 

The new Faust libraries were beta tested dur¬ 
ing the CCRMA Faust Summer Workshop at 
Stanford University. 8 In previous editions of 
the workshop, students had to go through the 
library files to get the documentation of specific 
functions. During last year’s workshop, thanks 
to the new libraries documentation, students 
were able to find information about functions 
simply by doing a search in the documentation 
hie. Additionally, none of them encountered 
problems while using the new libraries which 
was very satisfying. 

The Faust libraries are meant to grow 
with time, and we hope that this new for¬ 
mat will facilitate the integration of new con¬ 
tributions. Eventually, we plan to divide 
filters . lib into more subcategories, like we 
did for the old oscillator. lib. Finally, 
physmodels . lib which is a new library for 
physical modeling of musical instruments is cur¬ 
rently under development. 

5 Conclusions 

The new FAUST libraries provide a platform 
to easily prototype DSP algorithms using the 
Faust programming language. Their new or¬ 
ganization, in combination with their automat¬ 
ically generated documentation, simplifies the 
search for specific elements covering a wide 
range of uses. New “standard functions” help 
to point new users to useful elements to imple¬ 
ment various kind of synthesizers, audio effects, 
etc. Finally, we hope that this new format will 
encourage new contributions. 
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Abstract 

For L ’Imaginaire music ensemble, I composed a 
piece of interactive music involving acoustic 
instruments and surround electronic music. The 
interaction between live musicians and electronics 
is based on data collected in real time from 
acoustic instruments. This data is further used to 
adjust timbre and synchronize electronics with the 
rest of the music. 

Keywords 

SuperCollider, interactivity, music mixed 

1 Introduction 

For L'lmaginaire 1 music ensemble, I composed 
a piece of interactive music mixed. Mixed music is 
a term used in musicological literature to refer to a 
musical genre. It is defined by the alloy of 
instrumental music and electronic music. For this 
paper, I wish to focus on an interactive property of 
my piece. 

Elistorically, the electronic part of mixed music 
is composed in a studio and fixed on a magnetic 
tape 2 . This technique makes impossible the 
interaction between the inteipreters and the 
electronic sound. Feedback helps to improve the 
compositional process of the tape. The other 
technique is to perform audio processing of the 
acoustic instruments in real time 3 . In this case, we 


’Musical ensemble composed by Keiko Murakami 
(flutes), Philippe Koerper (saxophones) and Maxime 
Springer (piano). http://www.limaginaire.org/ . 

2 The first pieces of music using this technique: 
Orphee 51 and Orphee 53 by Pierre Schaeffer and 
Pierre Henry (1951 and 1953) and Musica su due 
dimensioni by Bruno Maderna (1952 and 1958). 

’The first pieces of music using this technique: 
Mixtur (1964) and Mikrophonie I (1964) by Karlheinz 
Stockhausen. 


are talking about a device that increases the sonic 
possibilities of the acoustic instrument. Feedback 
is used to regulate electronic sound by a human or 
an automaton. 

I use these two techniques of electronic sound 
accompaniment in my work, but the increase in 
computing power and the versatility of the tools 
have allowed a median way. In the next chapter, I 
present my problematic. Thereafter, I will show 
my issue with an example which could be 
extrapolated to other devices. 

2 Problematic 

According to Robert Rowe, "Interactive 
computer music systems are what are the changes 
in response to musical input. Such responsiveness 
allows these systems to participate in live 
performances, of both notated and improvised 
music" [1], How can a system listen to a musician 
and make an appropriate decision to generate a 
sound response? In his book, Rowe analyzes 
systems that use the MIDI standard to 
communicate between instruments and computers. 
But how can we use traditional instruments? 

The audio descriptors 4 correspond to parameters 
that describe an analyzed sound. A set of 
descriptors is used to construct a data set and 
create spaces for representing the sound. The 
parameters that are extracted can be described in 
different ways, depending on what is expected 
from the information conveyed by the parameter. 

The sound acquisition sensor for the processing 
unit is a microphone. Therefore, the use of audio 
descriptors in interactive computer music systems 
allows the use of standard acoustic instruments. 

4 With SuperCollider , it is necessary to install its 
extension package to benefit from a wide choice of 
descriptors : https://github.com/supercollider/sc3- 

plugins . 
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To control electronic sound with the sound of an 
acoustic instrument, many descriptors can be used. 
To do this, it is necessary to match a sound 
characteristic with the values of the parameters 
that describe it. The preparation of this report will 
allow us to establish a particular threshold and 
once it is crossed, the system can trigger a 
response. 

However, the interval between the extreme 
values of a parameter depends on its nature and on 
the analyzed sound source. To use this technique 
of interaction between an instrumentalist and 
electronic sound, a first difficulty is to negotiate 
with the heterogeneity of the data produced by the 
different audio descriptors. 

For example, I want to use the nuance and pitch 
of a sound to build a particular accompaniment. 
The amplitude descriptor returns a number that can 
range from 0 to 1 and the pitch descriptor returns a 
frequency. The range of extreme values returned 
by the pitch descriptor depends on the ambit of the 
analyzed sound source. Determining a trigger 
threshold to activate a response of a system 
involves crossing values of different magnitudes 
sometimes from sound sources of different natures. 

Moreover, this technique uses sound capture and 
this information is variable depending on its 
environment. Consequently, elaborate settings in a 
studio with a particular electro-acoustic chain 
would be less effective in another location with a 
different electro-acoustic chain. So, how can we 
treat the heterogeneity of the data produced by 
different audio descriptors, different sound 
sources, different electro-acoustic chains and 
different concert locations, to achieve 
homogeneous electronic music accompaniment 
for each interpretation of the same piece? 

3 Creating a repository 

The first step is to establish a reference. This 
will serve as a standard for a single piece, a 
specific electro-acoustic chain and a particular 
place. We will be obliged to renew it each time 
one of these three parameters changes. 
Furthermore, this repository will allow us to 
characterize our data and to determine thresholds 
for performing a particular electronic sound 
accompaniment. 

In the following example, we use the amplitude 
descriptor {Amplitude.kr) and pitch {Pitch.kr). The 
values are transmitted by the OSC protocol 
0 SendReply.kr) to the SuperCollider clients at the 


/dataTrigger address. The data is sent whenever 
an onset is detected {Onsets.kr). When 
instantiating the synthesizer, two arguments are 
available. The first determines the number of 
inputs of the signal to be analyzed on our audio 
interface {Soundln.ar). The second determines the 
onset detection threshold. 

( 

SynthDef(\dataTrigger, { arg in = 0, onsetsThres = 0.5; 
var signal = Soundln.ar(in); 

var chain = FFT(LocalBuf(1024), signal); 

war trig = Onsets.kr(chain, onsetsThres); 

SendReply.kr(trig, 1 /dataTrigger', 

[Amplitude.kr(signal), Pitch.kr(signal)[0]]); 

}).add; 

) 

Figure 1 

Execution of figure 1 will only give the 
definition of your synthesizer to SuperCollider 
audio server 5 . Synthesizer is not instantiated, so it 
does not work and it does not ask for resources to 
your hardware. We will run 6 it only when we want 
to receive data (figure 2). 

We now have to build a data collector to 
constitute our repository. To do this, we use the 
OSCFunc object. It is fast responder for incoming 
OSC messages. We configure it with the 
previously defined OSC address. When a new 
message arrives from the analyzer, it executes a 
function. In this case, this function saves the 
amplitude and pitch of the signal in array global 
variables. 

( 

var onsetsThres = -9.dbamp; 

-dataTrigger = Dictionary.new; 

// Run analytical synthesizer 

-analyzerTrig = Synth(\dataTngger, [\onsetsThres, onsetsThres]); 

// Saving data from the current analysis 
-oscTrigger = OSCFunc({ arg msg; 

-dataTrigger[\amp] = -dataTrigger[\amp].add(msg[3]); 
-dataTrigger[\pitch] = ~dataTrigger[\pitch].add(msg[4]); 

>, '/dataTrigger'); 

) 


// At the end, free objects 
-analyzerTrig.free; -oscTrigger.free; 

Figure 2 

We set to -9 dB the onsets detection threshold. 
We can change this parameter for change the 
density of the data reception. The analyzer listens 
to the first input of our audio interface (default 
setting). In figure 2, running the first block starts 
the acquisition of the data. Running the last line 
kills the synthesizer and responder instances, frees 
memory and processor usage. The collected 


5 http://doc .sccode.org/Classes/SvnthDef.html . 
6 http://doc .sccode .org/Classes/S ynth .html . 
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information is stored in variable arrays. We can 
plot data in graph or histogram (figure 3). 

-dataTrigger[\pitch].plot; 

-dataT rigger[\pitch].plotHisto; 

Figure 3 

The graph shows the evolution of the parameter 
over time and allows us to make a correspondence 
between a sound characteristic and some values. 
The histogram provides a representation of the 
distribution of the values of the parsed parameter. 
This observation allows us to characterize the 
distribution produced by a descriptor. 

4 Using the repository 

In our system, we determine the value of a 
threshold to trigger a response to accomplish 
dynamic electronic music accompaniment. This 
choice may be arbitrary or be determined in 
response to a specificity of the analyzed sound 
source. Above a certain value, our program 
triggers a response for example. We can also 
choose this value according to its frequency of 
appearance in the distribution and according to the 
sound result produced by this choice, we can 
increase or decrease the density of our electronic 
accompaniment by modifying the value of our 
threshold. However, how do we handle values of 
different magnitudes? 

We manipulate repository by the requested 
percentile. To use this method, it is necessary to 
install an additional library. For that, you can use 
the SuperCollider package manager 7 to install 
MathLib. This one gives us access to additional 
statistics methods for arrays. 

The percentile ra nk corresponds to the 
proportion of the values of a distribution less than 
or equal to a determined value. We manipulate our 
data with float values from 0 to 1. For example, if 
we want to know the value equal to 90% of our 
data, we use 0.9. 

~dataTrigger[\pitch].percentile(0.9); 

Figure 4 

During the adjustment phase of our system, we 
can tune several parameters of different 
magnitudes transparently with a single scale. 


5 System Response 

In order for our system to respond to certain 
stimuli, we must attribute to it a means of sound 
production. To do this, we define an arbitrary 
synthesizer (Figure 5). 

! 

SynthDef(\fmGrain, {arg out = 0, amp = 0.75, density = 20, carfreq 
440, modfreq = 200, modlndex = 1, pos=0, dur = 3; 

var env = EnvGen.kr(Env.perc, levelScale: amp, timeScale:dur, 
doneAction: 2); 

var signal = FMGrain.ar(Impulse.ar(density), 0.05, carfreq, 
modfreq, env*modIndex, env); 

Out.ar(out, Pan2.ar(signal, Line.kr(pos, pos.neg, dur))) 

}) .add; 

) 

Figure 5 


To implement a concrete example, we assume 
that our device listens to two types of percussion. 
One of the percussions emits sounds high-pitched 
than the other. We decide that our system will 
respond only to the percussion which emits the 
most high-pitched sounds and to the most loud 
sounds. With onsets detection threshold, our 
condition for triggering a response depends on 
two others parameters: 


pitchThres = -dataTrigger[\pitch].percentile(0.7); 
ampThres = ~dataTrigger[\amp].percentile(0.5); 

Figure 6 


If the frequency of the answers does not suit us, 
we can return to the choice of the values of these 
variables to modify the sensitivity of our system. 


( 

var pitchThres, ampThres, onsetsThres; 

// Interaction control 

pitchThres = -dataTrigger[\pitch].percentile(0.7); 
ampThres = ~dataTrigger[\amp].percentile(0.5); 
onsetsThres = -9.dbamp; 


// Run analytical synthesizer 

-analyzerTrig = Synth(\dataTrigger,[\onsetsThres, onsetsThres]); 


-oscReply = 0SCFunc({ arg msg; 

// Condition for reply density 

if((msg[4] > pitchThres).and(msg[3] > ampThres), { 

// Mapping control 
Synth(\fmGrain,[ 

\density, msg[3].linlin(ampThres,1, 3,20), 

\carfreq, msg[4].linlin( 
pitchThres, 

-dataTrigger[\pitch].maxltem, 
~dataTrigger[\pitch].percentile(0.05), 
~dataTrigger[\pitch].percentile(0.95)), 

\modfreq, msg[4] * msg[3].linlin(ampThres,1, 0.5,3), 

\modIndex, msg[3].explin(ampThres, 1, 1,20), 

\dur, msg[3].linlin(ampThres, l, 2,10), 

\pos, 1.0.rand2 

]); 

>); 

}, '/dataTrigger'); 

) 

Figure 7 


7 http://doc .sccode .org/Classes/Quarks .html . 
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The implementation of a response for our 
system follows the same structure as Figure 2. A 
responder wait for incoming OSC messages from 
the analyzer. When a new message arrives, if it 
meets the previously formulated conditions (high- 
pitched and loud), a response is issued. 

Thi s response is customized according to the 
analysis data. This connection is made by a 
sonification process, "technique of rendering 
sound in response to data and interactions" [2]. 1 
do not deal with the mapping technique in this 
paper, but its mastery is a source of variation and 
expressiveness for electronic music. To make this 
relation, we map the synthesizer parameter from 
an input range to an output range. We can set the 
input range with the repository information and 
adjust the output according to the desired sound 
quality. Figure 7 is the implementation of the 
response of our system. 

At the end, we must free objects for frees 
memory and processor usage. 


We can parallelize other listeners who analyze 
other sound qualities (brightness, noise, 
dissonance, etc.) to achieve other triggers 
threshold and make complex electronic music 
accompaniment executed by our system. 

For our player, we can define a maximum 
number of synthesizers executed in parallel in 
order to preserve the resources of the system and / 
or to control the acoustic density so as not to 
saturate our perception. 

In addition, we can perform a certain musical 
process 8 or that feeds on our repository instead of 
running a simple synthesizer. The interactive 
system developed by Jean-Claude Risset for his 
duets for one pianist [3] is very interesting for this 
way. He uses MIDI data (pitch, velocity and 
duration) which he transforms according to 
traditional compositional operations: 

transposition, reversal, canon, etc. 

7 Implementation 


-analyzerTrig.free; -oscReply.free; 

Figure 8 

6 For further 

For this paper, we concentrated our system to its 
simplest expression. In this chapter, we wish to 
develop it design. Some of these ideas were 
conceived during the development of our piece and 
others afterwards. 

Robert Rowe divides his interactive computer 
music system ( Cypher ) into two sections [1]. The 
listener analyzes the data produced by a musician 
and the player delivers a musical response. The 
structure of SuperCollider source code implies this 
organization. Our analysis synthesizer is the 
listener and the function of the responder object 
for incoming OSC messages is the player. We keep 
these terms to locate the following points. 

Our listener can also have an implicit function of 
time master. Indeed, we transmit the data of the 
analysis every time an onset is detected. But we 
can transmit them at a given frequency. The 
responses delivered by the system would then be 
have a beat. 

We can use our listener to produce an 
automation (with Env and EnvGen.kr objects). An 
automation allows to control and to automate the 
variation of a parameter over a given time. In this 
way, we determine the evolution of any threshold 
or parameter. 


I control my audio processor by a graphical 
interface (Figure 9) and MIDI controller. The 
different audio tracks allow me to adjust the 
intensity of the electronic sound layers. The flute, 
sax and piano tracks deal with interactive 
electronic sound accompaniment. The synth track 
manages my non-real time composite electronic 
music. Finally, the live track amplifies the 
acoustic instruments. 

The creation of the repository is realized 
directly in the interface and makes this action 
transparent. Graphical interface allows a sound 
engineer to play my work and this interface makes 
rehearsals and concerts easier. 



A MIDI pedal assigned to a performer manages 
the overall setting. Fourteen key moments 

8 http://doc.sccode.org/Tutorials/A-Practical- 
Guide/PG 01 Introduction.html. 
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articulate the electronics for a duration of fifteen 
minutes. 

To create my electronic accompaniment, 1 use 
the descriptors of amplitude, pitch, centroid and 
noise. The operation of coupling between the data 
of the analysis and the parameters of the 
synthesizers [2] makes it possible to control the 
form of electronic music accompaniment by 
scaling, transposition or reversal. This relation can 
be fixed or variable. 

Generally, the sound source of my electronic 
music accompaniment does not come from sound 
synthesis, but acoustic instruments. When the 
density of the responses of the system is not 
limited, the responses are superimposed to 
continuously transform the timbre of the electronic 
sound. When the density of the responses of the 
system is limited, the responses can arrive in 
successive waves and produce a dynamic 
accompaniment. 

8 Epilogue 

To design our system, our initial motivation was 
to simplify the use of different descriptors, to 
simplify our system settings, to customize the 
responses of the system according to the sensitivity 
of the interpreter and to produce a homogeneous 
electronic accompaniment to each interpretation of 
the same notated music under different conditions. 

But in addition, we obtain an open interactive 
system that can be adapted from a specific model 
to the intuition of a musician. We identified four 
steps to explore in order to implement our practical 
solution and develop an interactive scenario [4]. 

The first step focuses on the sound of the 
instrumentalist. What particularities of sound do 
we want to relate to our system? What filter do we 
want to use to trigger an answer? In the example 
developed for this paper, we make our filter with 
the parameters of onset detection, pitch and 
amplitude. The thresholds established to constitute 
this filter allow us to play on the sensitivity or the 
particularity of the answers delivered by our 
system. 

The second question to implement our solution 
is to choose the type of response to trigger. Should 
the answer be monophonic, polyphonic, 
contrapuntal, etc.? In other words, what 
organizational model should we use to develop our 
response? In our example, we produce one item 
per answer. This element is strongly correlated 
with the sound analyzed and the operations applied 


to determine the sound characteristics of our 
response are conceived during the last step of this 
practical solution. 

The third step in using our system is to 
determine which synthesizer we want to assign to 
our system. Controls can be implemented in 
synthesizers. This possibility can give us solutions 
to build a previously chosen model. For example, 
a synthesizer can perform a glissando. In the 
example developed for this paper, we use a simple 
granular FM synthesis. 

The final step in implementing our solution is to 
determine the type of relationship between the 
analysis data and the parameters of our 
synthesizer. How to get expressive sounds with 
sonification process [2] ? Should our relationship 
be static or dynamic? How should the plan of 
connections between these various elements be 
established? For the example developed in this 
paper, we have established a one-to-many and 
many-to-one static connection plan. The 
amplitude determined by the analysis is correlated 
with the granular density of the synthesis, the 
modulation index and the duration of the 
response. The pitch determined by the analysis is 
correlated with the pitch of our response. A 
transposition is performed by a scaling operation. 
Finally, the amplitude and the pitch analyzed 
serve to determine the modulation frequency of 
the FM synthesis of the response delivered by our 
system. 

9 Conclusion 

For this paper, we have implemented our open 
interactive system under SuperCollider - platform 
for audio synthesis and algorithmic composition. 
We could have implemented this system on other 
software. Moreover, use free software increases 
the durability of our work. Laurent Pottier recalls 
the history of the precariousness of technologies 
in electronic music [5] and free software answers 
to this problem. 

An example concerns the portage of Pluton by 
Philippe Manoury from the 4X 9 to Max [6]. The 
piece did not really sound exactly the same way 
on both platforms. After a thorough study of the 
4X, the engineers discovered that a 4X hardware 
limitation influenced the sound result. This 
limitation was implemented in the Max patch to 
find an electronic music equivalent [7]. 

9 4X is real time effect processors designed by 
Giuseppe di Guigno at IRCAM in the 1970s. 
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Free software is an important factor of durability 
and reproducibility in the digital art. The ubiquity 
[8] of free software allows more flexibility to 
imagine original devices [9]. In the end, 
researchers have no lock to study and increase the 
common good. 
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Abstract 

In this paper we present a library for 3D Higher Or¬ 
der Ambisonics (HOA) for the SuperCollider (SC) 
sound programming environment. The library con¬ 
tains plugins for all standard operations in a typical 
Ambisonics signal-flow: encoding, transforming and 
decoding up to the 5th order. Carefully designed 
PseudoUgens are the interface to those plugins to 
aim for the best possible code flexibility and code 
reusability. As a key feature, the implementation 
is designed to handle the higher order B-format as 
a channel array and to obey the channel expansion 
paradigm in order to take advantage of the power¬ 
ful scripting possibilities of SC. The design of the 
library and its components is described in details. 
Moreover, some examples are given for how to built 
flexible HOA processing chains with the use of node 
proxies. 


Keywords 

SuperCollider, Higher Order Ambisonics. 


1 Introduction 


Ambisonics, i.e. the description of sound pres¬ 
sure fields through spherical harmonics decom¬ 
position, has been around for quite a while since 
its invention by [Gerzon,~ 1973 . Back in the 


days, the harmonics decomposition was up to 
the first order (First Order Ambisonics, FOA), 
using the 4-channel B-format]]] The playback of 
FOA audio content depended on special hard¬ 
ware and did not make it into mainstream au¬ 
dio in the first decade of its existence for various 
reasons, one of which being that FOA offers only 
limited spatial resolution. 

Ambisonics research made significant ad¬ 
vances in the 2000’s through the work of Barn- 
ford [Bamford, 1995 , Malham [Malham, 1999 
and Daniel |Daniel, 2000] , who extended the 
sound pressure field decomposition to higher or¬ 
ders hence the term (HOA). HOA increases the 


1 The term B-format is often used for FOA signals, in 
this paper we use the term also for higher orders. 


spatial resolution and thereby reduces the limi¬ 
tation of low spatial definition when compared 
with other spatialization techniques. 

For streamlining and standardising content 
production, one hurdle that HOA was facing in 
the past was the coexistence of various channel 
ordering and normalization conventions. In or¬ 
der to address this issue, the Ambix standard 
was proposed by |Nachbar et ah, 2011 and is 
ever since increasingly adopted by recent HOA 
implementations. 

Today, processors can handle with ease multi¬ 
ple instances of multi-channel sound processes. 
Further, the rise of video games and Virtual 
Reality (VR) applications has elicited new in¬ 
terest in Ambisonics amongst audio researchers 
and content creators. This is mostly due to its 
inherent property to yield easy-to-manipulate 
isotropic 360 degree sound pressure fields, which 
can be rendered either through multi loud¬ 
speaker arrays or headphones. In the case of VR 
applications head-tracking is already available 
and the listener is always in the sweet spot. For 
the capturing of HOA 3D sound pressure fields, 
various microphone array prototypes have been 
developed some of them being available as com¬ 
mercial products like mhacoustic’s Eigenmike® 
for instance. As far as multi loudspeaker re¬ 
production is concerned, the number of loud¬ 
speaker domes with semi spherical configura¬ 
tions is growing and electroacoustic composers 
have also shown increasing interest in HOA as a 
spatialisation technique, notably amongst them 
for composition and [Barrett, 2010) and sonifi- 
cation jBarrett, 2016]. 


1.1 Ambisonics in various platforms 

Over the last few years HOA has seen various 
implementations in diverse sound software en¬ 
vironments, mostly as plugins in DAWs. The 
Ambisonics Studio plugins by Daniel Courville, 
for instance, have been around for some timq^J 

^http://www.radio.uqam.ca/ambisonic 
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Another recent and very comprehensive exam¬ 
ple its the Ambix pl ugin suite from [Kronlach- 
ner, 2013 also see |Kronlachner, 2014a . For 


Pure Data and MaxMSP, HOA libraries have 
been made available by the Centre de recherche 
Informatique et Creation Musical an early im¬ 
plementation can also be found with the ICST 
Ambisonic tools jSchacher, 2010|. An early ver¬ 
sion for Pure Data can be found in the collection 
of abs tractions called CubeMixer by [Musil et 
al., 2003 . Recently, the HOA library Ambitool^ 
developed mainly in Faust has been made avail¬ 
able |Lecomte and Gauthier, 2015]. 

1.2 SuperCollider 

The audio synthesis environment SuperCollider 
(SC) by | McCartney, 2002] is particularly well 
suited for the creation of dynamic audio scenes. 
SC is split into two parts: The server scsynth for 
efficient sound synthesis and sclang, an object 
oriented programming language for the flexible 
configuration and re-patching of DSP trees on 
the server. Similar to most sound programming 
environments, synthesis is based in SC on unit 
generators called Ugens. Third party Ugens are 
collected separately in SCSplugins. Extensions 
to sclang are mananged through Quarks. Ugens 
can be assembled to more complex arrangements 
through synthesis definitions, known as Syn- 
thDefs , which are executable binaries for synthe¬ 
sis in scsynth. In sclang, PseudoUgens can be 
created, which is another way of handling com¬ 
plex arrangements of Ugens in sclang, which are 
compiled for scsynth, when needed. For a de¬ 
tailed introduction to SC s ee [V alle, 2016] and 
the SuperCollider bool^Jby |Wilson et al., 20lT 


1.3 Ambisonics in SuperCollider 

In 2005, Frauenberger et al. implemented HOA 
in SC as the AmblEM QuarlJ^] This implemen¬ 
tation goes up to the 3rd order, and follows the 
old Furse Malham channel ordering and normal¬ 
ization. All unit generators (Ugens) like en¬ 
coding, rotation, and simple decoding are im¬ 
plement in sclang as PseudoUgens. AmblEM 
comes with an simulation of early reflections in 
a virtual room but lacks functionality such as 
beamforming. The Ambisonics Toolkit (Atk) 
for SC by |Anderson and Parmenter, 2012] is 


d http://www.mshparisnord.fr/hoalibrary/en 
4 http://faust.grame.fr/news/2016/10/17/ 
Faust-Awards-2016.html 

^http://supercolliderbook.net/ 
c https://github.com/supercollider-quarks/ 
AmblEM 


a more recent and very comprehensive set of 
tools. The Atk includes for instance various 
transformations to manipulate the directivity of 
the sound held such as pushing and zooming. It 
is however only a first order implementation of 
Ambisonics. 


1.4 Library design in SuperCollider 

In this context the paper presents a modern 
HOA implementation for SC, which is modular 
and adopts all established standards in terms 
of channel ordering and normalizations. In¬ 
spired by the approach found in the Atk and 
typical for the general design of SC, compu¬ 
tationally intensive parts like Ugens are split 
from PseudoUgens convenience wrapper classes 
of sclang. The HOA library comes hence in three 
parts, SC3plugins, PseudoUgens and audiocon¬ 
tent plus HRTFs for binaural rendering in a sup¬ 
port directory. 


1.4.1 SC3plugins 

The first part of the library is a collection of 
Ugens, which is part of the SC3plugins collec¬ 
tion. Each Ugen is compiled from C++ code. 
It consists of a SC language side representation 
of the Ugen as a .sc class hie and a .sex, .so or 
.dll compiled dynamic link library, for the plat¬ 
forms (OSX, Linux or Windows) respectively. 
For each Ambisonics order (so far up to order 
5), there are individual Ugens for the encoding, 
transforming and decoding processes in a typi¬ 
cal Ambisonics signal how. The C++ code for 
these Ugens is generated from the HOA library 
Am.bitools [Lecomte and Gauthier, 2015] with 
the compilation tool faust2supercollider. This 
approach was taken for two reasons: 

First, to leverage the work already accom¬ 
plished in Faust. Indeed, the Faust compiler 
generates very efficient DSP code and the Faust 
code base allows to efficiently combine exist¬ 
ing functionality. The meta approach through 
Faust will lead to future additions of function¬ 
ality, which can then be easily integrated in the 
HOA library for SC. 

Second, each Ambisonics order comes with a 
dehned multichannel B-format, this in turn de¬ 
fines the amount of input and output arguments 
for the Ugens. For instance, a Ugen rotating an 
Ambisonic signal of order 3 has 16 input argu¬ 
ments plus the rotation angles and 16 output 
channels. While it is of interest to expose the 
Ambisonics order as an argument for the flexibil¬ 
ity and reusability of code on the side of sclang, 
it is an argument unlikely to be changed while 
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Encoding 

PseudoUgen 



Transforming 

PseudoUgen 



Decoding 

PseudoUgen 



Figure 1: The Ambisonics processing chain in 
the HOA library for the selected order 3 with 
16 channels. 

an instance of the Ugen is running as a node in 
the DSP tree on scsynth. This is why for every 
order there is a unique Ugen for each function 
(encoding, transforming, decoding). 

The Ugens follow these conventions, some of 
which are explained in subsequent sections): 

• The Ambisonics channels are ordered ac¬ 
cording to the ACN convention. 

• The default normalisation of the B-format 
is N3D. 

• All azimuth and elevation arguments follow 
the spherical coordinates convention from 
SC. 

• Operations of resource intensive Ugens can 
by bypassed. 

Based on the implementation in Faust, the 
main functionalities of the HOA Ugens provided 
as the SC3plugins are so far: 

• Encoding and decoding of planar waves and 
spherical waves using near field filters. 

• Mirroring, Rotation (around azimuth and 
full 3D). 

• Various Ugens for beamforming, returning 
mono as well as B-format signals. 

• Various decoders in conjunction with Head- 
Related Impulse Responses (HRIRs) for 
binaural monitoring. 


1.4.2 PseudoUgens 

The second part of the library is available as 
the sclang extension HOA Quark. While the 
SC3plugins are designed for computational ef¬ 
ficiency of the sound synthesis processes, The 
HOA Quark is conceived to unlock the flexibil¬ 
ity of making sound in SC with respect to code 
reusability and the scaling of synthesis scripts. 
Each typical operation in Ambisonics (Encod¬ 
ing, Transforming, Decoding) is here provided 
as a PseudoUgen. Depending on the Ambisonics 
order provided as an argument, the PseudoUgen 
returns and instantiates the correct Ugen from 
the SC3plugins collection on the sound server. 
Since the Ambisonics order is an argument for 
the PseudoUgen, the number of channels in the 
B-format vary and so does the number of in¬ 
put arguments in the UGens. This is why the 
B-format is handled as a channel array. This 
makes the SC code flexible for experimenta¬ 
tions with different orders depending on compu¬ 
tational resources. All arguments of the Pseu¬ 
doUgen obey the channel expansion paradigm. 
This means that if any of the arguments is an 
array (or an array of arrays), the PseudoUgen 
returns an array (or an array of arrays) of Ugens. 

Figure [I] shows the relation between the SC 
language side (PseudoUgens in light grey) and 
the SC3plugins (Ugens dark grey). If the Am¬ 
bisonics order is set to 3 and passed as an argu¬ 
ment to the PseudUgen, the corresponding Ugen 
with 16 channels is returned and a typical pro¬ 
cessing chain (Encoding Transforming Decod¬ 
ing, encircled in red) can be established. The 
main features of the design of the HOA library 
implementation on the language side are: 

• B-format is handled as a channel array. 

• All arguments obey the channel expansion 
paradigm. 

This leads to the following advantage, when 
scripting HOA sound scenes in SC. Compared 
with graphical data flow programs like for in¬ 
stance Pure Data, changing the order means to 
reconnect all channels between objects interfac¬ 
ing with the B-format. In SC changing the Am¬ 
bisonics order in a single global variable changes 
the order of the whole HOA processing chain. 

1.4.3 Support directory 

The third part of the library is a platform in¬ 
dependent support directory for various HOA 
sound file recordings and convolutions kernels 
from HRIRs for the binaural rendering of HOA 
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sound scenes. The support directory approach 
is similar to the Atk implementation from which 
we adapted the corresponding class. The rea¬ 
son to keep these resources separate from the 
Quark directory is mostly due to the size of the 
included sound files. We provide some 4th or¬ 
der HOA sound files that have been recorded 
with the Eigenmike® with the support of Ro- 
main Dumoulin from CIRMMT. The HRIRs 
provided are either measured from a KU-100 
dummy head [Bernsc huetz, 20131 or computed 
from a 3D mesh scan of several people’s face. 
The directions of the HRIRs follow a 50-node 
Lebedev grid, allowing an Ambisonic binaural 
rendering up to order 5 |Lecomte et al., 2016b . 


2 Encoding 

As the first step in an Ambisonics rendering 
chain, the library provides PseudoUgens for en¬ 
coding into the B-format. One for the encoding 
of mono sound signals, one for microphone array 
prototypes and one for the commercially avail¬ 
able Eigenmike® microphone array. 


2.1 HOAEncoder 


This PseudoUgen creates an HOA scene from 
mono inputs encoded as a (possibly moving) 
sound source in space. The source can be en¬ 
coded 1) as a plane wave with azimuth and 
elevation ( 9 p ,5 p ) respectively 2) as a spherical 
wave with position (r s ,0 s ,6 s ), where r s is the 
distance to origin of the source. The spherical 
wave is encoded using near-field filters [Daniel, 
20031. In the current implementation, those fil¬ 


ters are stabilized with near-field compensation 
filters. Thus, in this case, the radius of the loud¬ 
speaker layout r sp k used for decoding is needed. 
Note that if the spherical source radius is such 
as the source is focused inside the loudspeaker 
enclosure (r s < r sp k), a "bass-boost" effect may 
occur with potential excessive loudspeaker gain. 
This effect increases as the source get closer to 
the origin | Daniel, 2003 |Lecomte and Gauthier, 

2515] . 


HOAEncoder.ar(1,SinOsc.ar(f),a,e) 
) 

// returns 

[OutputProxyOutputProxy] 

HOAEncoder.ar(1, 

SinOsc.ar([fl,f2]), 
a,e); // returns 

[[OutputProxy,..,..,OutputProxy], 
[OutputProxyOutputProxy]] 


HOAEncoder.ar(l, 

SinOsc.ar([fl,f2]), 
a,e).sum;// returns 
[OutputProxyOutputProxy] 

If an array of azimuth and elevation argu¬ 
ments, matching in size those of the source 
SinOsc.ar([f 1, f 2]), flexible and scalable code 
for multi source encoding can be created. 


2.2 HOAEncLebedev06 / 26 / 50, 
HOAEncEigenmike 


This collection of PseudoUgen offers at first the 
Discrete Spherical Fourier Transform (DSFT) 
for various spherical layout of rigid spherical 
microphone. In the current implementation 
the proposed geometries are 06- 26- or 50-node 
Lebedev grid |Lecomte et al., 2016b| and Eigen- 


Mike grid|Elko et al., 2009 


The components of 
the DSFT are then filtered to take into account 
the diffraction by the rigid sphere and retrieve 


the Ambisonic components Moreau et al., 2006 
[Lec omte et al., 2015] The filters are applied by 
setting the filter flag to 1 as shown in the next 
code listing: 


// Encode the signals from the 
// Lebedev26 grid microphone 
H0AEncLebedev26.loadRadialFilters 


(s) ; 

{H0AEncLebedev26.ar(4, Soundln.ar 
(0 ! 2 6) , filters: 1)1.play 


3 Converting 

In order to correctly reconstruct a sound field 
from the channels of the B-format, it is im¬ 
portant to know about standard normaliza¬ 
tion methods for the spherical harmonic com¬ 
ponents, as well as channel ordering conven¬ 
tions. Two main channel ordering conventions 
exist: The original Furse-Malham (FuMa) |Mal - 
ham, 19991 higher-order format, an extension 
of traditional first order B-format up to third 
order (16 channels). FuMa channel ordering 
comes with maxN normalization, which guaran¬ 
tee maximum amplitude of 1. The FuMa format 
has been widely used and is still in use but is 
increasingly replaced by the Ambisonic Channel 
Number (ACN) ordering |Nachbar et al., 201l]. 
ACN typically comes with (the full three-D nor¬ 
malisation) where all signals are orthonormal. 
SN3D (Semi-Normalized 3D) spherical harmon¬ 
ics. This normalization has the advantage that 












































LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


99 


none of the higher order signals exceeds the level 
of the first Ambisonic channel, W (ACN 0). 

However, this normalization does not provide 
an orthonormal basis of spherical harmonics and 
this latter case is recommended for transforma¬ 
tions which rely on the orthormality property of 
spherical harmonics. Therefore, the library uses 
internally the N3D (full 3D normalization) with 
ACN convention. 


channels from the B-format inputs are combined 
to produce a monophonic output as if a direc¬ 
tional microphone was used to listen into a spe¬ 
cific direction in the sound field. In the current 
implementation, the beampatern provided are 
regular hypercardioids up to order 5 see |Meyer 
and Elko, 2002| 

5.2 HOAHCard2HOA 


3.1 HOAConvert 

The HOAConvert PseudoUgen accepts a B- 
format array as input and converts from and 
to ACN_N3D, ACN_SN3D, FuMa_MaxN. It 
is mostly meant to convert existing B-format 
recordings into ACN N3D for use within the li¬ 
brary The other use case is to render B-format 
mixes to other conventions for other production 
contexts. 

4 Transforming 

In its current implementation, the HOA library 
provides 3 standard operations like rotation and 
mirroring to transform the B-format. 

4.1 HOAAzimuthRotator 

This PseudoUgen rotates the HOA scene around 
the z-axis, which is accomplished with a rota¬ 
tion matrix in x and y due to the symmetry in z 
of the spherical harmonics. For the matrix def¬ 
inition see |Kronlachner, 2014b |. In combina¬ 
tion with horizontal head tracking, this trans¬ 
formation can stabilise horizontal auditory cues 
for left-right movements when the rendering is 
made over headphones in VR contexts. 

4.2 HOAMirror 

This PseudoUgen mirrors an HOA scene at the 
origin in the directions along the axes left-right 
(y), front-back (x), up-down(z). According to 
Kronlachner, 2014b |, this can be accomplished 
by changing a the sign of selected spherical har¬ 
monics. 

4.3 HOARotatorXYZ 

This PseudoUgen rotates a HOA scene around 
any given angle around x,y,z. The rotation ma¬ 
trix is computed in spherical harmonic domains 
using recurrence formulas |Ivanic and Rueden- 
berg, 1996]. 

5 Beamforming 

5.1 HOAHCard2Mono 

This PseudoUgen extract a mono signal from 
the HOA scene according to a beampatern. The 


This PseudoUgen applies a hyper-cardioid 
beam-pattern to the HOA scene to enhance 
some directions and outputs a directional fil¬ 
tered HOA scene |Lecomte et ah, 2016aj. 
The proposed beam-patterns are regular hyper¬ 
cardioids as described in [Meyer and Elko, 2002 


The selectivity of the directional filtering in¬ 
creases with the order of the beam-pattern. This 
transformation requires an order re-expansion 
such that the output HOA scene should be of 
the order of the input HOA scene plus the beam- 
pattern order |Lecomte et al., 2016a|. 


5.3 HOADirac2HOA 

As in the previous section, this PseudoUgens 
performs a directional filtering on the HOA 
scene but this time the beam-pattern is a di¬ 
rectional Dirac, that is to say a function which 
is zero everywhere except in the chosen direc¬ 
tion. As a result the output HOA scene contains 
only the sound from the chosen direction. Thus, 
this tools helps to explore the HOA scene with 
a "laser beam". For more details see |Lecomte 
et ah, 2016a . 


6 Decoding 


For the decoding of HOA signals two different 
ways of rendering the sound field are possible: 
First via headphones, or second through a setup 
of multiple loudspeakers. 

For the headphone option the HOA signal is 
decoded to spherically distributed virtual speak¬ 
ers. For the best possible spatial resolution more 
speakers are needed then there are channels in 
the B-format. Each speaker signal is then con¬ 
volved with HRTFs and the resulting left and 
right channels are summed respectively. For the 
distribution of the virtual speakers a regular dis¬ 
tribution on the sphere is desirable, so that the 
decoding matrix is well behaved. This is why 
according to |Lecomte et ah, 2 015] and similar 
to the microphone array prototypes from above 
a Lebedev grid is chosen. 
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6.1 HOADecLebedev06 / 26 / 50 

This collection of PseudoUgen decodes an Am- 
bisonics signal up to 50 virtual speakers posi¬ 
tioned as nodes on a Lebedev grid. The decod¬ 
ing on the 50-node Lebedev grid works up to 
order 5. This grids contains two several nested 
sub-grids which work up to lower order with less 
nodes |Lecomte et al., 2016b |. Therefore, the 6 
first nodes are sufficient for first order and the 
26 first nodes are sufficient up to the third order, 
ff the HRTF filter flag is set to 1, the signals are 
convolved with the kernels and summed up to 
yield a left and right headphone speaker signal. 
Prior to this, the convolution kernels need to be 
loaded to the sound server as shown in the next 
code listing: 


// 

load a 

HOA so 

und 

file 

~f 

ile=Buff 

er.read (s 

,"hoa30.wav") ; 

// 

prepare 

binaural 

filters 

HOADecLebe 

dev26 . 

loadHrirFilters() 

{HOADecLeb 

edev26 

. ar 

(3,//order 3 


PlayBuf 

.ar (16 

, ~f 

ile,1,loop : 1) , 


hrir_Fi 

Iters : 

i) 


}. 

play ; 





6.2 HOADec 

For the case of decoding for speaker arrays 
Heller et al., 2008] distinguish 3 cases: 

1. regular polygons (square, octagon) and 
polyhedra (cube, octahedron) 

2. semiregular arrays (non equidistant but op¬ 
posing speakers, like in a shoebox) 

3. general irregular arrays (e.g. ITU 5.1, 7-1 
... semispherical speaker domes) 


For the cases 1 and 2, decoder matrices can 
be obtained by matrix inversion. If, depend¬ 
ing on the positions of the speakers, the result¬ 
ing decoder matrix has elements are of similar 
magnitudes, it is suitable for signal processing. 
For case 3, which are arguably the more realistic 
cases, a variety of state of the art techniques ex¬ 
ists, see for instance [Zotter et al., 20l2 , [Zotter 


et al., 2010] , and Zotter and Frank, 2012| . An 


implememtation of these techniques exceeds the 
scope of this library. However, for the construc¬ 
tion of decoders for specific irregular speaker 
arrays, we refer the user to the Ambisonic De¬ 
coder Toolbox by Aaron J. HclleiQ This toolbox 
produces decoders as Faust files, which can be 


'https://bitbucket.org/ambidecodertoolbox/ 
adt.git 


compiled onlintj^] as Ugens and in turn can then 
be integrated in the HOADec PseudoUgen class 
template. 

7 The distance of sound sources 

One novel aspect of the underlying Faust impl- 
mentation of Ambitools is the spherical encod¬ 
ing of sound sources using near field filters. For 
the correct reproduction of the HOA scene, the 
distance of the sound source and the radius of 
the reproducing (virtual) speaker array needs to 
be set. The correct near field filters are either 
applied by setting it in the encoding or in the 
decoding step. 


// 

load the bin 

aural 

filters 

H0ADecLebedev26 

.loadH 

rirFilters () 

{ 

var src ; 



sr 

c=H0AEncoder. 

ar ( 



3,//order 




PinkNoise.ar 

(0.1), 

//source 


az,//azimuth 




ele , 11 elevat 

ion 



plane_spheri 

cal : 1 , 



radius:2, 



// 

set the speaker radius here 


speaker_radi 

us : 1) 


H0ADecLebedev26 

. ar ( 



3,//order 




src,//source 



// 

or set the s 

peaker 

radius here 

// 

speaker_radi 

us : 1 , 



hrir_Filters 

: 1) 


}. 

play; 




8 HOA and SynthDefs 

The use of PseudoUgens leads to one important 
caveat when working with SynthDefs. The Am- 
bisonics order is an argument pertaining to the 
PseudoUgen, it can hence not be an argument 
of a SynthDef. The reason is that at compile 
time the Ambisonics order would remain unde¬ 
fined and the PseudoUgen does hence not know 
which Ugen to return When working with Syn¬ 
thDefs code reusability can still be achieved as 
shown in the next code listing: 

// set the max order: 

"order = 5; 

"order . do (-[ I i I // iterate 
SynthDef( //create unique names 
"hoaSin"++(i+l).asString, 


^http://faust.grame.fr/onlinecompiler/ 
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{Out.ar (0 , 

HOAEncoder.ar( 
i+1,//increase the order 
SinOsc.ar()))}, 

) .add ; 

>) 

// play the Synths 
Synth(\hoaSinl); 

Synth(\hoaSin5); 

9 HOA and Node Proxies 

For the flexible creation of typical Ambisonics 
render chains, Node Proxies [Rohrhuber and de- 
Campo, 20111 provide an excellent tool in SC. 
Node Proxies autonomously handle audio busses 
and conveniently allow to crossfade between au¬ 
dio processes of a selected node, freeing silent 
process when the crossfade is completed. This 
allows to dynamically change sources in the en¬ 
coding, transforming and decoding step in the 
rendering chain. The following code example 
shows a flexible scenario with changing seam¬ 
lessly from an XYZ rotation to beamforming. 

~o =3; 

~chn=("order +1) ,pow(2) ; 

// load hoa sound file: 

~bf=Buffer.read(s,"file.wav"); 

// b-format file player: 
~player=NodeProxy(s,\audio,~chn); 
~player.source= 

{PlayBuf.ar(~chn,~bf)}; 

// Node for xyz rotation: 

“trans=NodeProxy(s,\audio,~chn); 
"trans.source= 

{var in;in=\in.ar(0!16) 
HOATransRotateXYZ.ar( 
"o.in, 

yaw,pitch,roll)}; 

// rotate the scene 
"trans.set(\yaw , angle) ; 

// decoding , 

~dec=NodeProxy(s,\audio,~chn); 

~dec.source= 

{var in;in = \in.ar (0!16) 
HOADec.ar(~o,in,)}; 

// chain the proxies together 
“player <>> "trans <>> “dec; 


// change rotation to beamforming 

"trans.source= 

{var in;in=\in.ar (0! 16) 
H0ABeamDirac2Hoa.ar( 

~o,in , 
az , ele)} ; 

// direct the beam 
"trans.set(\az , angle); 


10 Conclusions 

We have presented a HOA library for SC. The 
design of which resulted in great flexibility and 
makes it a valuable addition to experiment with 
HOA in various contexts. Due to the meta ap¬ 
proach through Faust, future additions to the 
library are feasible and we look forward to ex¬ 
periment with it in the context of VR and 
video gaming platforms but also for the creation 
of sound material for electro acoustic composi¬ 
tions. We believe that the flexibility and live 
coding capacity of SC is particularly useful in 
the context of HOA, where repeated listening 
is essential to asses the perceptually complex 
mutual interdependence of temporal and spatial 
sound characteristics. 
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Abstract 

Stoat is a tool which identifies realtime safety haz¬ 
ards. The primary use is to analyze programs which 
need to perform hard realtime operations in a por¬ 
tion of a mixed codebase. Stoat traverses the call- 
graph of a program to identify which functions can 
be called from a root set of functions which are ex¬ 
pected to be realtime. If any unsafe function which 
could block for an unacceptable amount of time is 
found in the set of functions called by a realtime 
function, then an error is emitted to indicate where 
the improper behavior can be found and what back¬ 
trace is responsible for its call. 

Keywords 

Realtime safety, static analysis, LLVM 

1 Motivation 

When using low latency audio tools an all too 
common problem encountered by users is au¬ 
dio dropout caused by an excessive run time 
of the audio generation or processing routine. 
This artifact is also commonly known as an 
xrun. Xruns can be generated when there’s sim¬ 
ply too much to calculate during the allocated 
time, but it can also be easily generated by any 
function which takes an unreasonable amount 
of real time to execut^j] The latter category of 
functions typically include operations involving 
dynamic memory, inter-process communication, 
file 10, and threading locks. 

For low latency audio to reliably work, a 
frame of audio and midi data must be processed 
within a short fixed time window. Audio call¬ 
backs are then known as functions bound by a 
real time constraint, or realtime for short. A 
large portion of code can have it’s total exe¬ 
cution time bounded when the size of data is 
known in advance. Some code however cannot 
be simply bounded. As a simple example, con¬ 
sider prompting the user for synthesis pararne- 

x as opposed to cpu time 


ters and waiting for a response. The user could 
enter a response quickly or they could never pro¬ 
vide a response. The class of functions which 
aren’t bounded by the real time constraint, that 
a realtime program operates with, are known as 
non-realtime. 

To avoid xruns, realtime programs should be 
composed of functions with reasonable realtime 
bounds, and thus non-realtime functions are un¬ 
safe for a reliable program. Typically realtime 
system programmers acknowledge the timing 
constraint and design systems with this limita¬ 
tion in mind. Simple tests may be used to iden¬ 
tify the typical execution times as well as vari¬ 
ance, though it’s easy for bugs to creep in. In 
particular, the C or C++ open source projects 
in Linux audio frequently have architectural is¬ 
sues making realtime use unreliabl^} 

Maintaining a large codebase in C or C++ 
can make it very difficult to both know what a 
given function can end up calling or when a par¬ 
ticular function could be called. Typically prob¬ 
lems start with a mixed realtime/non-realtime 
system, such as UI and DSP sections of code; 
the segregation within one codebase may not be 
at all clear in implementation. This is further 
complicated by the opaqueness of some C++ 
techniques, such as virtual overloading, opera¬ 
tor overloading, multiple inheritance, and im¬ 
plicit conversions. Overall these complications 
make manual verification of large scale realtime 
programs difficult. 

Stoat offers a solution to identifying realtime 
hazards through an easy-to-use static analy¬ 
sis approach. Static analysis makes it possible 
to identify when functions claimed to be real¬ 
time can call unsafe non-realtime functions even 
when complex C or C++ call graphs are in¬ 
volved. The approach offered by Stoat makes it 
easy to identify these programming errors which 
can be used to greatly improve the reliability of 

2 see appendix A 
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low latency tools. 


1.1 Prior Art 

Stoat isn’t the first tool to address the prob¬ 
lem of identifying these realtime safety haz¬ 
ards. Several years prior to the creation of 
Stoat, Arnout Engelen created jack-interposer 
which is a runtime realtime safety checker 
Engelen, 2012 . Jack-interposer works by 
causing a program to abort if within the 
JACK process callback any known unsafe non¬ 
realtime function is called. The functions which 
jack-interposer identifies as unsafe include, 
10 functions (vprintf (), and vfprintfO), 
polling functions (select (), poll()), interpro¬ 
cess communication (waitQ), dynamic mem¬ 
ory functions (mallocO, realloc(), free()), 
threading functions (pthreadjmutex_lock(), 
pthread_join()), and sleep(). 

As a runtime analysis tool jack-interposer re¬ 
quires the program to be executed to identify 
errors and each error is reported as it’s en¬ 
countered. Individual errors are presented by a 
message without a backtrace or by halting the 
program and allowing a developer to use a de¬ 
bugger. Jack-interposer has the same issue as 
other runtime tools compared to static analysis. 
Namely, exhaustive testing requires the user or 
testing script to run the program through all 
states which involve different logic. Doing so is 
a difficult, error prone, and tedious task. Ad¬ 
ditionally, jack-interposer was designed to only 
be used with JACK clients, while Stoat works 
with any program, JACK based or not. 

Stoat is based of an earlier attempt at 
creating another more general static analysis 
tool. The predecessor project, Static Func¬ 
tion Property Verifier, or SFPV, attempted to 
address more general problem of tracking de¬ 
scribed properties through a programs feasi¬ 
ble call graph |McCurry, 20141. SFPV used 
the Clang compiler’s API to record precise 
source level information Lattner, 20081. Un¬ 


fortunately the Clang API was subject to rapid 
breaking changes, slow to compile, and vastly 
under documented, so SFPV was rewritten to 
create Stoat. Stoat in comparison uses a limited 
subset of the LLVM API without interfacing di¬ 
rectly with Clang. 


2 Examples 

Both runtime and static analysis tools, includ¬ 
ing jack-interposer and Stoat, attempt to ad¬ 
dress the same overall problem. Both aim at 


detecting when a function which can be exe¬ 
cuted in a realtime thread can call a function 
which may block for an unacceptable amount 
of time. In C, an example of this is is shown in 
listing [TJ root_fn() can call malloc () through 
two intermediate functions, unannotatecLfnO 
and unsaf e_fn() . When Stoat is provided with 
an out-of-source annotation on root_fn() it can 
then use the call graph to deduce that an unsafe 
function can be called. 

Listing 1: Example C Program 

void root_fn (void) { 
unannotated_fn () ; 

} 

void unannotated_fn ( void ) { 
unsafe_fn () ; 

} 

void unsafe_fn ( void ) { 
malloc (10); 

} 


For a C program many call graphs are rela¬ 
tively simple and no complex type information 
is needed. C++ call graphs however make ex¬ 
tensive use of operator overloading, templates, 
and class based inheritance. Listing [2] shows 
an example of the root_fn() calling a method 
which may or may not be safe based upon which 
implementation of method () is called. As the 
class hierarchy is available to Stoat, the root 
function can be conservatively marked as unsafe 
as method () would call mallocO if Obj was an 
instance of the Unsafe class. Depending upon 
the workload of a particular program, this data 
dependency might be satisfied very rarely, so a 
purely runtime based approach may not identify 
the error. 

Listing 2: Example C-|—|- Program 

void root_fn(Obj *o) { 
o—>method () ; 

} 

class Unsafe: public Obj { 

virtual void method(void) { 
malloc (10); 

} 

}; 


3 Stoat Implementation 

Stoat consists of several components. First, 
there is a compiler shim to dump LLVM based 
metadata though LLVM IR files. Second, there 
is a series of LLVM compiler passes to extract 
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inline annotations and call graph information. 
Last, there is a ruby frontend to perform deduc¬ 
tions on the extracted call graph and to produce 
diagnostic messages and diagrams. 

Stoat uses information present in LLVM 
bitcode to capture the program’s call struc¬ 
ture. Generating bitcode for individual files 
can be difficult to integrate with complex 
software projects’ build systems. A similar 
issue was presented b y Clang ’s official static 
analysis tools |Kremenek, 2008 . Their solution 
was to have a stand-in which replaces the 
normal C/C+-1- compiler^) Stoat offers two 
compiler proxy binaries, stoat-compile and 
stoat-compile++, which provide a way to 
simplify generating LLVM bitcode similar 
to Clang’s scan-build toolchain. For an au- 
totool based project analysing source code 
is as simple as running CC=stoat-compile 
CXX=stoat-compile++ ./configure && make 
and then running stoat -r .. 

For each LLVM bitcode file Stoat runs four 
custom LLVM passes. These passes respectively 
identify: the function calls, or call graph within 
the program; the C++ virtual methods associ¬ 
ated in each class; the C++ class hierarchy; and 
in-source realtime safety annotations. 

First the call graph is constructed. Within 
the LLVM IR the Call and Invoke operations 
call another function and they contain meta¬ 
data about what function is being called. For 
C functions this is relatively simple. Consider 
the IR associated with void foo(){bar()} in 
listing |3j 


Listing 3: LLVM IR For C Call 

define void @foo() #0 { 
entry: 

call void @bar() 
ret void 

} 


For C++, extracting the call graph is some¬ 
what more complex due to the introduction of 
virtual methods. Virtual methods are a struc¬ 
tured version of function pointers calls and they 
can be identified by the two-step process to ob¬ 
tain the function pointer. First, a class instance 
is converted to the virtual function table, or 
vtable. Then, the method’s ID is used to ex¬ 
tract the method from the vtable and the re¬ 
sulting function pointer is called. The LLVM 

3 http://clang-analyzer.llvm.org/scan-build, 
html 


IR for a virtual call is shown in listing [5] and it 
corresponds to the source shown in listing [4} 

Listing 4: C-|—f- Call 

void foo(void) { 

Baz *baz ; 
baz—>bar (); 

} 


Listing 5: LLVM IR For C++ Call 

define void @_Z3foov() #0 { 
entry: 

%baz = alloca %c lass. Baz*, align 4 
%0 = load %class . Baz** %baz , 
align 4 

%1 = bitcast %class.Baz* %0 to 
void (%class . Baz*)*** 

%vtable = 

load void (%class . Baz*)*** %1 
%vfn = getelementptr inbounds 

void (%class . Baz*)** %vtable , 
i 6 4 0 

%2 = load void (%class . Baz*)** %vfn 
call void %2(%class . Baz* %0) 
ret void 

} 


Next the vtable calls need to be mapped back 
to real functions. Vtables are stored as a global 
symbols and can be identified by the “_ZTV” 
prefix used in normal C++ symbol mangling 
procedures. The class hierarchy can be re¬ 
constructed by identifying chained constructors 
from class to class. 

With the information presented by the nor¬ 
mal call graph and the C++ virtual methods 
an augmented call graph can be constructed. 
First, any vtable methods are assumed to call 
any method implementation of the base class or 
any child class. Then, suppression file entries 
are used to remove edges from the augmented 
call graph to avoid false errors typically seen in 
error handling. 

The last LLVM pass looks for in¬ 
source safety annotations in the form of 
__attribute__( (annotate ("realtime") )) and 
__attribute__( (annotate ("non-realtime"))) 
These annotations can be added to the end of 
a function declaration to add metadata to the 
function within the C or C++ source. The 
annotations are augmented with out-of-source 
annotations in the form of whitelist and 
blacklist files. 

Once the augmented call graph is constructed 
and a subset of the functions in the program 
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are annotated, a series of deductions can be 
made. Any function which is called, but never 
implemented is assumed to be non-realtime if 
not specified otherwise. Any function which is 
unannotated and called by a realtime function 
is assumed to be realtime. Any function which 
is realtime or assumed realtime that calls a non¬ 
realtime function produces an error with an as¬ 
sociated deduction chain. 

The errors can be presented in either a textual 
or graphical form. The current format includes 
the function that calls the unsafe function along 
with the deduced path. An example of an error 
flagged by Stoat is dynamic memory use within 
jalv when it is in a debug mode. 

Error #514: 

serd_stack_new 

##The Deduction Chain: 

- serd_writer_new : Deduced Realtime 

- sratom_to_turtle : Deduced Realtime 

- jack_process_cb : Realtime (Annotation) 
##The Contradiction Reasons: 

- malloc : NonRealtime (Blacklist) 

Alternatively, figure [T] shows a partial view of 
a graphical representation of call graph nodes 
involved in errors. When dealing with a legacy 
codebase the graphical representation tends to 
be preferable as it visually shows which routines 
contain the most errors, and which errors are 
the most common. Additionally, for C+-1- code¬ 
bases who’s error involve long template expan¬ 
sions the graphical representation shortens the 
displayed names to result in a still large, but 
more manageable view on the software’s archi¬ 
tecture. 


Figure 1: Partial output from Stoat applied to 
ZynAddSubFX 2.5.(jj 


^http://fundamental-code.com/2.5. 
0-realtime-issues.png 


3.1 Limitations 

Stoat offers a number of improvements over 
prior art, though Stoat does have its limitations. 
Namely, Stoat doesn’t track data dependencies 
on realtime safety. This task is one where run¬ 
time analysis tools, such as jack-interposer, can 
identify errors which Stoat isn’t able to find or 
avoid false positives. 

Two primary data dependent issues which 
produce misleading results include the use of 
unsafe function pointers and the use of unsafe 
error handling code. A short example of the 
former would be: 

Listing 6: Function pointer call 

void function ( void (* fn ) ( void )) { 
fn () 

} 

If and only if functionO is only passed real¬ 
time functions, then it is a realtime safe func¬ 
tion, but the data passed into the function isn’t 
analysed by Stoat, so function pointer calls are 
typically overlooked. 

Debug and error handling code is a common 
source of false positives and the example error 
from jalv shows one such example. In listing [Tj 
functionO would be marked unsafe. The un¬ 
safe function should never be called in practical 
use and a runtime checker would not flag this 
case. A similar class of issues can occur if a 
function has different realtime safe behavior de¬ 
pending upon a flag passed to the function as 
may be the case with codebases which do not 
have separate functions for realtime and non¬ 
realtime tasks. 

Listing 7: Example error handling 

void function ( void ) { 
if(fatal.error) 

call.unsafe_function (); 

} 


3.2 Discussion 

Stoat and it’s predecessor, SFPV, were origi¬ 
nally created as a tool to assist with finding is¬ 
sues within the ZynAddSubFX synthesizei|jand 
bringing it into compliance with realtime safety 
issues. While minor issues still exist, several 
users have reported improved reliability at lower 
latencies compared to earlier versions. Stoat has 

5 this includes any conditional code execution based 
upon constant or non-constant data 

e http://zynaddsubfx.sf.net/ 
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since been used as a verification tool in: libr- 
toscQ car kj^J ingerj^J and jalvp^j Ideally it will 
be used on more projects within Linux Audio to 
identify realtime hazards in the future. The use 
of Stoat or jack-interposer would assist in cor¬ 
recting the poor user experience and possibly 
a negative reputation for stability that realtime 
hazards have created in a variety of realtime 
projects. 

When Stoat doesn’t understand regular 
structure within a program it is relatively easy 
to extend. ZynAddSubFX uses roughly 500 
callbacks through librtosc. Stoat has already 
been extended to automatically annotate these 
callbacks As mentioned in the limitations, func¬ 
tion pointers are difficult to reliably track with 
static analysis, librtosc callbacks however have 
per callback metadata which can be used to as¬ 
sociate a statically known function pointer with 
information which can be used to identify which 
ones are expected to be executed in the realtime 
environment. This process was tested in Zy¬ 
nAddSubFX and used to resolve several bugs. 

4 Conclusion 

Stoat offers a new method to inspect exist¬ 
ing software projects and direct attention to¬ 
wards code which may be responsible for real¬ 
time hazards. Addressing these realtime haz¬ 
ards can improve the experience within a va¬ 
riety of Linux audio applications and plugins. 
Through the use of automated tools such as 
Stoat realtime hazards can be identified and 
corrected quickly. Additionally, the static anal¬ 
ysis approach of Stoat complements the prior 
art of runtime analysis that projects like jack- 
interposer provide. Stoat is available at https : 
//github.com/fundamental/stoat under the 
GPLv3 license. 
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A Brief Survey of Realtime Safety 

In order to validate the claim that “projects in 
Linux audio frequently have architectural issues 
making realtime use unreliable”, a survey was 
conducted on a sampling of Linux synthesiz¬ 
ers. Each synthesizer as presented by http: // 
www. linuxsynths . com/ was given a brief man¬ 
ual code review (typically < 15 minutes per 
project) looking for common realtime safety vi¬ 
olations. If source code was not available or 
could not be located for a code review then the 
project was excluded. Projects marked with a 
have had an in depth code review prior to 
the writing of this paper. The results shown in 
table [I] show that 18 of 40 projects (or 45%) 
have some easy to identify realtime safety issue. 

Outside of LMMS and ZynAddSubFX the re¬ 
altime hazards within each project has not re¬ 
ceived additional verification. Based upon expe¬ 
rience working with projects not included in this 
list, additional realtime hazards are expected to 
be observed when tools like jack-interposer or 
Stoat are applied. 
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Table 1: Linux Synthesizer Realtime Safety Observations 


Software Name 

Observed Status 

Notes 

6PM 

likely unsafe 

appears to launch threads within rt-thread 

Add64 

likely unsafe 

blocking gui communication in rt-thread 

Alsa Modular Synth 

likely unsafe 

unsafe data mutex 

amSynth 

likely unsafe 

unsafe memory allocation in rt-thread 

Borderlands 

likely unsafe 

unsafe locks/memory allocation in rt-thread 

Bristol 

likely safe 

appears safe 

Calf tools 

likely safe 

appears safe 

Cellular Automaton Synth 

likely safe 

appears safe 

Dexed 

likely unsafe 

unsafe memory allocations in rt-thread 

DX-10 

likely safe 

appears safe 

Helm 

likely unsafe 

memory allocation in rt-thread/addProcessor() 

Hexter 

likely safe 

appears safe 

JX-10 

likely safe 

appears safe 

LB-302 

likely unsafe 

see LMMS 

LMMS* 

unsafe 

unsafe locks in rt-thread, unsafe memory allo¬ 
cation in rt-thread, creation of threads in rt- 
thread, blocking communication to user inter¬ 
face in rt-thread, etc 

Monstro 

likely unsafe 

see LMMS 

Mr. Alias 2 

likely safe 

appears safe 

Mx44 

likely safe 

appears safe 

Nekobee 

likely safe 

appears safe 

Newtonator 

likely safe 

appears safe 

OBXD 

likely safe 

appears safe 

Organic 

likely unsafe 

see LMMS 

Oxe FM Synth 

likely safe 

appears safe 

Peggy2000 

likely safe 

appears safe 

Petri-Foo 

likely safe 

appears safe 

Phasex 

likely safe 

appears safe 

Samplevl 

likely unsafe 

possible memory allocation in rt-thread 

SetBFree 

likely safe 

appears safe 

Sineshaper 

likely safe 

appears safe 

Sorcer 

likely safe 

appears safe 

Synthvl 

likely unsafe 

possible memory allocation in rt-thread 

Triceratops 

likely safe 

appears safe 

Triple Oscillator 

likely unsafe 

see LMMS 

Tunefish 4 

likely safe 

appears safe 

Vex 

likely safe 

appears safe 

Watsyn 

likely unsafe 

see LMMS 

WhySynth 

likely safe 

appears safe 

Wolpertinger 

likely unsafe 

unsafe memory allocation in setParameterQ 

Xsynth 

likely unsafe 

variety of mutexes used in the rt-thread 

ZynAddSubFX* 

unsafe 

unsafe memory allocation in oscillator wavetable 
generation (the total number of realtime hazards 
was greatly decreased with the use of Stoat) 
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Abstract 

Auditory Virtual Environments (AVEs) are used to 
simulate audio environments in real spaces. As room 
in room reverberation system (RRR) they augment 
the acoustics in spaces, e.g. in concert halls and 
music theaters. Why not utilize them for theater 
music as acoustic stage design and therefore as a 
playable instrument ? 

Even more, tune them to extreme configurations, 
so that absurd acoustic situations can be realized, 
absurd in the sense of not normal or possible in 
real physics and using distortions in time, space, fre¬ 
quency and signal domains. 

This paper discusses the conceptualization and 
design of an artistic research project using AVEs for 
a theatre and some of the new aspects of these ideas 
are discussed. For the multi-space theater produc¬ 
tion “the Trial” from Franz Kafka for actors, singer, 
choir and stage design at the Art University in Graz 
networked AVEs have been realized, utilizing Am- 
bisonics systems in concert halls and movable acous¬ 
tics instruments on open spaces. 

Keywords 

Auditory Virtual Environment, acoustic, stage de¬ 
sign, computer music, Ambisonics 

1 Introduction 

An auditory virtual environment (AVE) is a vir¬ 
tual environment (VE) that focuses on the au¬ 
ditory domain only. It sees itself independent 
from other modalities like vision. Nevertheless 
an AVE could also be combined with the vi¬ 
sual domain. Depending on the application, the 
user may be either a passive receiver or be able 
to interact with the environment. Three dif¬ 
ferent approaches for implementations of AVEs 
are listed in Blauert’s book “Communication 
Acoustics” [Novo, 2005] from Novo: 

1. Authentic reproduction of real existing en¬ 
vironments. 

The virtual room should evoke in the lis¬ 
tener the same percepts that would have 


been evoked by the corresponding real envi¬ 
ronment. He should have same spatial im¬ 
pression moving through and perceive his 
own movement inside the environment as 
well as the movements of sound sources. 

2. Reproduction of plausible auditory events 
This approach tries to evoke auditory 
events which the listener perceives as hav¬ 
ing occurred in a real environment. Here 
only those features are implemented which 
are needed for a specific simulation situa¬ 
tion. 

3. Creation of non-authentic plausible audi¬ 
tory events or environments. 

The virtual room doesn’t evoke percepts 
in the listener which are related to a 
real acoustic environment, evoking audi¬ 
tory events where no authenticity or plausi¬ 
bility restraints are imposed, targeting pure 
virtual environments like computer games. 



Figure 1: physical adjustable acoustic for Beat 
Furer’s music theater FAMA 

1.1 the setting 

For the music theater production based on the 
novel “Der Process“ (engl. ’’The Trial”), writ¬ 
ten by Franz Kafka from 1914 to 1915 for actors, 
singer, choir and stage design, an experimental 
theater music composition should be done: 
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The hopeless search of the main char¬ 
acter “Josef K” for the reason of his 
arrest is the grandiose template for a 
inter-institutional project: a play with 
perception, a sensual journey through 
existential abysses and absurdities of 
the bureaucracy with students and lec¬ 
turers of the institute stage design / 
acting / singing, song, oratorio / elec¬ 
tronic music and acoustics, choir of the 
Kunstuniversitat Graz. 


The production was roughly spread in three 
parallel played scenes at three subsequent places 
with an collective intro at the foyer and collec¬ 
tive finale at the concert hall: 



Figure 2: Ligeti hall stage design with big move- 
able blocks 


Gyorgi Ligeti hall 400m 2 concert hall con¬ 
structed for virtual acoustics. 

theatre in the Palais (TIP) 200m 2 theatre 
hall constructed traditional acoustics. 

courtyard between these houses with a 
Peepshow construction as stage design. 

After the “Intro” in the foyer, the audience 
was divided into 3 groups, each group attend¬ 
ing a 30 minute performance in one of the places 
and have been guided from one place to the 
other within the intermissions. 

1.2 the compositional approach 

Additional constraints for the musical acoustic 
composition has been made to concentrate en¬ 
force the ideas of the piece: 

One main idea was to use signal processing for 
the experimental theater music composition for 
the play “Der Process” on live signals only and 
do not use any pre-produced sound material. 
All material should be based on live recorded 
sound signals using different microphones at the 


places; to construct virtual soundscapes for dif¬ 
ferent audio reproduction systems and virtual 
acoustics as a main instrument within these 
sceneries. 

Another constraint that all places should be 
treated as networked AVEs. 



Figure 3: TIP stage with big hole in the middle, 
traditional chariot-and-pole-system 

1.3 the experimental approach 

The a artistic research question have been: can 
a non-plausible non-authentic AVE, applied as 
a complex music instrument for theater music, 
produce a varying plausible acoustic sceneries. 

As an extension, the AVE should use distor¬ 
tions in space, time, spectrum and signal do¬ 
main and should therewith produce an distorted 
AVE, which is still perceived as acoustics, an 
absurd acoustics, Therefore the production was 
titled “AVE-Absurdum”. With this concept the 
category of AVEs should be extended to a fourth 
category of AVE, let us name it “absurd AVE”, 
which is non-authentic but plausible in an ab¬ 
surd way of reception and in respect to the vi¬ 
sual domain. This AVE does evoke percepts in 
the listener which are related to a real acoustic 
environment and the live sound produced, by 
the actors and other real sound sources. 

As a common audio 3D sound representation 
Ambisonics should be used, also to allow sim¬ 
ulations of these AVEs at development phase 
prior the first rehearsals. 

Ambisonics was chosen, not only because of 
already implemented Ambisonics system at the 
concert hall, which has been the very well tested 
in previous productions like “Pure Ambisonics”, 
but for streaming the acoustical impact of one 
room to another. Therefore spatial recordings 
and mixes as 3D audio streams was used, so 
spatial information of the audio signals can be 



LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


113 



Figure 4: loudspeaker in Ligeti hall: Ambisonics L, subwoofer S, extra E, and microphones M. The 
heights of the speaker increases to the middle from 2m to 8m. 


used in other spaces. Also Ambisonics can be 
used in directional speaker system, used for the 
move-able acoustics in between the spaces. 

As a stage design using processes as backdrop, 
like an additional layer on the theater music it¬ 
self, as an big invisible ensemble of signal pro¬ 
cessing algorithms, the generated sound envi¬ 
ronment represent a complex machine. So from 
another perspective these AVEs can be seen as 
part of the “theater machine” in the meaning 
of Gilles Deleuze concept of machines[Raunig, 
2004; Deleuze and Guattari, 1977]. 

2 AVE Absurdi 

For the three spaces, three different implemen¬ 
tation of Ambisonics has been designed: 

Ligeti-Saal Ambisoncis 4th order with 32 am¬ 
bisonics speakers and 2 subwoofer, 7 direc¬ 
tional microphones hanging from the ceil¬ 
ing, 2 headsets for main actors, 2 pickups 
on the floor and 2 on the blocks of the stage 
design. 

Theater im Palais (TIP) ambisonics 2D 
ring on the ceiling, subwoofer, 2 Mikro- 
fones for Reverberation, 2 microphones for 
enhancement of special plaxes, 1 headset 
for trigger only. 

courtyard between two houses Movable 
spherical directional loudspeaker driven by 
embedded linux computers connected to 
multichannel amplifier and a directional 


microphone, powered by batteries and 
played by actors. 

The order used in the different places is nor¬ 
mally defined by the amplification system, but 
here we work also with Ambisonics streams and 
virtual microphones detecting different signals, 
the maximum is limited by the encoding system. 

The Ambisonics system at the TIP, since the 
stage was designed as proscenium stage with the 
audience at one wall, was not satisfying and can¬ 
celed by the director there, who wanted a purely 
stereo frontal speaker system. Anyway streams 
from other places has been used for the sound 
environment. 

In the following the space in the Ligeti hall 
and the movable acoustic will be discussed. 

2.1 AVE in Ligeti concert hall 

Varying playable acoustics has been developed 
as an acoustician for Beat Furer’s music the¬ 
ater FAMA. As a stage design a real room in 
room with rotate-able wall elements, one side 
absorbers one side reflectors, for 200 listeners 
was build like a huge machine. One restrain 
was to use no electro-acoustic element. Un¬ 
like this physical adjustable acoustics, electro¬ 
acoustic AVEs should be implemented. 

Since the already installed Constellation 
Acoustic System from Meyer Sound[Sound, 
2010] with circa eighty of small speakers and 
about twenty microphones in 5 meter heights 
as a closed system was not in any way flexible 
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enough to fulfill the requirements of the idea of 
an “AVE absurdum”. 

The speaker used for the 3D Ambison- 
ics system are shown in figure4. 31 active 
Klingt&Freytag speakers have been used, where 
the first 29 of them are mounted on pan¬ 
tographs, which can adjust the heights and di¬ 
rection of each speaker individually as presets. 
The L22 speaker has to be adjusted higher, be¬ 
cause of the blocks from the stage designe and 
has been second by two others L30 and L31 on 
stands on the floor to lower the acoustical hori¬ 
zon. Additional two sub-woofer left and right in 
the front corners for enhancing the Ambisonics 
sound and used for special subsonic effects have 
been placed. 

The Hemisphere was slightly expanded as el¬ 
lipse and stretched to the front to increase the 
“sweet spot”. With this number of speaker a 
5th order Ambisonics system could be realized. 
But since all the obstacles and additional mov¬ 
able blocks using a 3th order Ambisonics had 
smother results on moving sources, increased 
spatial continuity and avoided to spatial aliasing 
errors over the room which resulted in a bigger 
“sweet spot”. 

As an decoder the standalone decoder of 
the AmbiX plugin suite[Kronlachner, 2013] was 
used, for which Matthias Frank from the IEM 
calculated an suitable Allrad-decoder [Zotter 
and Frank, 2012] . For preproduction of effects 
and the development of the AVE in a studio 
or over headphones the binaural decoder with 
the special set of impulse-responses, measured 
in the Ligeti-Hall, was provided. 

The decoder was fed with Ambisonics signals 
from applications within the Linux computer 
and over a MADI-Audio Interface input routed 
through the Lawo Mixing console from other 
computers, using “jackd”. Therefore three com¬ 
puter musicians were able play in parallel using 
the same AVE system over one central decoder 
feeding the speaker. The sub-woofer manage¬ 
ment has been done in the Mixer, using the 
Ambisonics signals and an additional a subsonic 
effect channel for special effects. 

The AVE-Absurdum has been implemented 
with PuredatafPuckette, 1996] running patches 
on different computers connected over MADI 
Audio Interfaces. The main computer imple¬ 
mented an Ambisonics Mixer with the room 
in room reverberation system (RRR), derived 
from the CUBEmixerfRitsch et al., 2008] de¬ 
velopment of previous years and the “acre” 


Pd extension library with the therefore devel¬ 
oped Ambisonics Toolbox module for Pd: “acre- 
amb” [Ritsch, 2016] using “iem-ambi” external 
library. 

“acre-amb” is a collection of high level Pd 
abstraction, to implement Ambisonics function¬ 
ality for Ambisonics mixing and processing of 
multichannel signals and controls to be used 
in compositions and effects. Also a goal was 
to easily integrate Ambisonics encoder, decoder 
with calibration and speaker distribution, pro¬ 
viding also connection and processing targeting 
fast prototyping of new Ambisonics algorithms. 



Figure 5: consoles (from left): Lawo Mixing 
Desk, Controller for effects, Computer Console 
with Pd Patch and AmbiX Decoder, Controller 
for time machine and memory player, Spectral 
Ambisoncis, with notebook as controller 

2.1.1 Room in Room Reverberation for 
acoustics 

The core of the RRR system is a multichannel 
reverberation system with 6 Inputs and 12 early 
reflections and 6 late reverb channels to be spa- 
tialized in the 3D space of the AVE. 

It was not possible within this production 
time to mike a choir with 110 singer, especially 
because they move sometimes erratically in the 
room. Adjusting to limited rehearsal time, we 
had to find a solution where the actors and choir 
can play with different absurd acoustics, uti¬ 
lizing the conductor and movement-director to 
explore and fixate this effects within their re- 
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hearsals, starting within the very first rehearsals 
and eliminating the need to track the movement 
of the choir and actors for the composition. 

The solution was chose was enabling “active 
zones”, areas under microphones, where choir, 
actors and audience are enhanced and encounter 
different acoustics feedback. These effects can 
be switch or cross-faded for each scene. Also 
actors can be in an different acoustical spaces 
parallel to the choir or audience: 

Therefore the RRR was driven directly by mi¬ 
crophones in 3-4m heights, enabling the differ¬ 
ent playable acoustics from small garage reverb, 
long tunnels with scatter echoes to big halls, 
even further to echoes like from surrounding 
buildings, mountains, allowing > 200ms early 
reflections. This allows to build none-plausible 
acoustics, like increasing energy on reflections 
and/or different acoustical rooms in one room: 
an absurd AVE. 

Additional to limiting the output, especially 
decreasing feedbacks of the reverb, each micro¬ 
phone got an feedback suppression EQ for the 
3 most resonant frequencies of the room. 

2.1.2 Distortion in Space 

Within the RRR spatializing early reflections 
from only one direction or placing all late re¬ 
verb to the other site, the acoustical space can 
be shaped: eg. imaging a big room in one direc¬ 
tion and a wall in the other. This can be done 
dynamically, with a sudden appearance of a late 
reverb from one site. Since a changing acous¬ 
tics can be perceived better than a static one, 
since change of size of rooms, normally does not 
happen, are drawing more attention to listeners 
than static ones. 

A overlap of one acoustic space over another 
seems to be more unnaturally, but since most 
singers and actors use reverb on stages, the au¬ 
dience is used to this effect. 

2.1.3 Distortion in signal 

Another effect was inserting processing of the 
microphone signal path. Within this project 
three types has been tested: 

spectral Resonances and filters 

dynamics Limiter, Compressor and Expander. 

shaping Waveshaping: tubes, metal strings, 
noise (cut) and also string simulator, metal 
plate. 

Spectral filter have same effects as different 
spectral properties of reflection material and 


is therefore only spectacular, if really applied 
strongly. Changing this dynamically changes 
the whole “sound color” of the scenes. 

Dynamics have been mostly applied on singer 
and actors. Nowadays audience is widely fa¬ 
miliarized with these effects for solo perform¬ 
ers, but doing it extreme, which means silent 
passages become loud and loud voices decrease 
the volume is a strong effect, but is something 
singers do not like. Therefore it was used to 
increase the struggle of the actor against the 
environment, here acoustics. The drawback are 
that it was really hard to control without feed¬ 
back at silent phases and can be perceived as an 
mistake in the performance very easily. 

A really strong effect is the distortion espe¬ 
cially of the early reflections in the reverb: A 
tube shape make the room a warm sound and 
using nonlinear shapes introduces noise. Addi¬ 
tional metal distortion like ring-modulator with 
2 inputs signals within a parallel dialog can pro¬ 
duce really scary rooms. As drawback the feed¬ 
back is again an issue, so mostly limiters has to 
be used. 



Figure 6: Choir surrounding the audience and 
actors behind a transparent curtain 

2.2 Distortion in Time 

We called it “artistic time-stretching”;, which is 
an ongoing research project done by Manuel 
Planton on the IEM, where time-stretching 
should be applied in live situations. time¬ 
stretching and realtime is clearly a contradic¬ 
tion, since stretching leads in the past, which 
means the signal is not within the realtime con¬ 
straints. 

There has been three different phases in per¬ 
ception experienced: 

• time-stretched signal within the early re¬ 
flections delay < 80m-s 

• echos up to 300ms 
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• a playback of a detached recording of the 
signal > lsec — inf 

Playing within this phases is the artistic ap¬ 
proach, where voices has to be slowed down first 
and then speed up again. Doing this, the sound 
is amplified first, than scattered and becomes 
then a dialog with the live signal, which is a very 
thrilling effect, even more it seems like a replay 
like a “deja-vu” experience. It turned out to 
be a thrilling effect, which actors liked to play 
with. It was used on solo pieces on actors and 
repetition phases of the choir. Introducing feed¬ 
back loops of the signal to the time-stretcher 
optionally combining with pitch shifts, expands 
the possibilities of this effect even further. So it 
was used solo for some scenes and the effect sig¬ 
nal spatialized independently from the position 
of the source. 

For the implementation a own Pd external 
was written, using the rubberband library and 
additional an overlap and add (OLA) algorithm. 
Considering the limited rehearsal time and the 
big parameter space, the time-stretcher has to 
be played interactively, observing the actors by 
an additional electronic musician. As an own 
instrument the Pd-patch was run on an separate 
computer with own controllers, mixed in via a 
Ambisoncis bus signal. 

2.3 Spectrum processing 
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Figure 7: Spetcre Pd GUI 


Another special effect has been the spectral 
distortion over space, we named it “spectre” de¬ 
veloped and played by Christoph Ressi. With 
an special patch using small FFTs/IFFTs, the 
spectral information was split into several chan¬ 
nels, which have been spatialized in the 3D 
space. High frequencies could be played from 
another direction than low ones and spread over 
the hemisphere. Drawing tables controls the 
movements and spreading. Also a feedback loop 
to within the effect was introduced, so it can 


do a kind of spectral freezing. This develop¬ 
ment was used to audio-process the choir in¬ 
put signals and distribute them in the space. 
The choir chant tends to be a acoustical en¬ 
vironment with itself, especially if the choir is 
surrounding the audience. On transient signals 
with fast glissandi elements like shouting, clap¬ 
ping and stamping the effect is audible like a 
rapid movement of the sound in the room. On 
long notes especially accords, the rooms begin 
to feel like stretched and softened walls, because 
it is hard to hear any dimension, since reflec¬ 
tions are masked by direct sound. The effect 
was used on a one scene as solo acoustic perfor¬ 
mance and frozen during conversions. 



Figure 8: development prototype for linux 
player with 8xl00W for 2 tetrahedron-speaker: 
decoupled USB 2/8channel, dc/dc, olimex-A20, 
2x class-D amplifier to be powered by 12V bat¬ 
tery 

3 Movable Virtual Acoustics 



Figure 9: agent with microphone playing 2 di¬ 
rectional speaker 
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For virtual acoustics in the courtyard, where 
no speakers and fixed installation was available, 
a special concept of moving acoustics was con¬ 
ceptualized. Operated by electronic musicians 
as “agents” in the play, the movable acoustics 
instrument with AVE-function and stream ren¬ 
dering features, were integrated in the play by 
the director: 

Using spherical loudspeaker arrays allows us 
to beam sound to many directions utilizing Arn- 
bisonics signals. With walls of buildings and 
rooms around in the courtyard, reflection can 
be induced, which triggers a kind of surrounding 
sound. The simplest of the spherical geometries 
the tetrahedron, which has been used before in 
a performance enhancing the room acoustics of 
a church[Robert lepenik, 2014] . The Tetra¬ 
hedron loudspeaker have 4 wideband speaker 
mounted on each plane and can be placed on 
an portable stand. The electronics consists of 
a 4x100 W class-D amplifier, supplied by an 
12V12Ah rechargeable battery, driven by an 
“Olirnex ARM-A20” embedded computer with 
a hacked multichannel USB audio interface, 
a phantom power microphone-preamp, speaker 
cable and an microphone over XLR cable. The 
agents can carry the whole electronics in their 
bags and hold the microphone and speaker. 

A directional microphone has been chosen for 
interaction with the surrounding, so the agents 
can focus and play with the sound input of the 
environment using a kind of AVE-patch. Re¬ 
ceiving the Ambisonics streams from the other 
spaces, using an addtional virtual microphone, 
they can select signals from other performances 
to be combined in the audio scene. 

The whole signal processing was done by a 
Pd patch including different effects like feedback 
with reverb, pitch shifting, delays etc. to realize 
a movable AVE. This work was named “AVE- 
tetrahedron” and experimental explored before 
on the campus. 

To play this instrument small controllers 
mounted to the arms have been used. 

4 Ambisonics network 

Streaming Ambisonics was developed for the 
COMEDIA project[Ritsch, 2010]. Using this 
technique, the 3D acoustic signal of an room can 
be delivered to other spaces, broadcasting eg. 
the 25 channel Ambisonics signal from Ligeti 
hall to others. The receiver can choose the AVE 
and place virtual microphones inside, using con¬ 
trollable Ambisonics decoder. 


Tetrahedral Speaker Control 



SPEAKER 1 SPEAKER 2 SPEAKER 3 SPEAKER 4 


Figure 10: tetraeder drive for AVEs 

For streaming scripts for “gstreamer” has 
been written as transmitter and receiver con¬ 
nected via “jackd” to Pd. This allows a ad¬ 
justable and acceptable latency with a sufficient 
buffering for different situations. 



Figure 11: Network of AVEs 


5 Conclusions 

The whole production was a big success from 
the reaction of the audience and the partici¬ 
pants. The AVE concept was accepted, after 
some persuasiveness, explaining the concept to 
all participants. Because of the limited time, 
there was not much space to criticize and over¬ 
throw the concept, so even it was very tight we 
tried to stay as close to the concept as possible 
or drop it, like within the TIP. 

To focus on transformations and not so much 




















LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


118 


on sound-effects was a very wise decision, since 
effects brings to much additional parallel con¬ 
tent and are not so invasive to support the idea 
of Kafka’s absurd world. 

The concept of AVE absurdum as a playable 
instrument has been proven in the Ligeti-hall. 
The movable acoustics instruments works nicely 
in small areas. Simple effects like distortion in 
time work surprisingly well. Imprinting other 
acoustics of one room in the other also works 
fine in most situations, but since listeners are 
used to it in media perception, are not spectac¬ 
ular. 
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Abstract 

This study describes a posture classification method 
for a marker-free depth camera. The method con¬ 
sists of an object identification procedure, feature 
extraction, and a naive Bayesian classification ap¬ 
proach with a supervised training. Point clouds 
obtained from the depth camera are split into ob¬ 
jects. For each object a set of features is ex¬ 
tracted. A method of feature pre-processing is pro¬ 
posed and compared against a statistical orthogo- 
nalisation method. Using a manually labelled train¬ 
ing data set, the probability distributions for the 
Bayesian classification are obtained. As a result of 
the classification, the most likely gesture is assigned 
to each object in real time. Classification perfor¬ 
mance was tested on a separate data set and reached 
about 80%. 

Three different applications are described: Auto¬ 
matic estimation of user postures to estimate the 
influence of hearing devices on user behaviour in 
communication situations, the control of an inter¬ 
active audio-visual art installation, and interactive 
light control on a dance-floor setup with multiple 
dancers. Classification performance in these appli¬ 
cations was measured and discussed. 


Keywords 

gesture classification, behaviour analysis, hearing 
devices, interactive art, subject-in-the-loop 


1 Introduction 


With the development of assistive technolo¬ 
gies, there is a growing need for robust auto¬ 
matic identification of human postures and ges¬ 
tures. Gesture recognition is used for improv¬ 
ing the human-machine communication, e.g., in 
hand gesture-based device control Freeman and 


Weissman, 1997; Richarz et ah, 2008 . Another 


use case is the classification of gestures and pos¬ 
tures that describe the subject’s behaviour or 
provide information on the current state of the 
subject |Busso et ah, 2008; Melo et al., 2015 


Automatic recognition of various postures has 
potential applications in research areas where 
the test subject’s behaviour is analysed. As 


an example from the hearing research, in typ¬ 
ical communication situations, leaning forward 
while listening is associated with a high listen¬ 
ing effort, whereas sitting more relaxed indi¬ 
cates a lower effort Paluch et ah, 2015 . Man¬ 


ual labelling of user behaviour in similar tasks 
is usually time consuming and is not sufficient 
in case of the ’subject-in-the-loop’ experiments, 
where the measurement is controlled by the re¬ 
sponses of the test subject. Interaction between 
the subject reactions and the measurement pro¬ 
cedure is desired when aiming at more realistic 
experimental conditions, but can also provide 
additional performance measures from the ex¬ 
perimental feedback loop. ’Subject-in-the-loop’ 
experiments require a real-time classification of 
gestures and postures. This differs from conven¬ 
tional behavioural experiments where a post- 
hoc analysis of the data is possible. Besides re¬ 
search applications, machine control functions 
based on natural postures are possible, e.g., a 
hearing device could increase the noise reduc¬ 
tion efficiency when the user’s change in pos¬ 
ture indicates a higher listening effort. Such an 
application would require a body-worn motion 
tracking sensor, e.g., accelerometer and gyro¬ 
scope embedded into a hearing device. 

Gesture and posture recognition tools are also 


applied in music and arts Ciglar, 2008 Don- 
narumma, 2011 . Typically, an artist controls 


music generation and modification tools with 
gestures, resulting in a mixture of dance and 
music performance. The classification system 
proposed here is designed to be useful for mu¬ 
sic and art applications with multiple users. 
One application is an audio-visual installation, 
where the postures of the audience influence the 
sound and vision. Another potential real-time 
application is the live interactive light and mu¬ 
sic control system for a dance-floor. 

Real-time analysis of postures and gestures 
from depth images is commonly achieved via 
skeleton modelling Shotton et al., 2013 . In the 
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applications of this study, such a high level pos¬ 
ture model is not required, because only a lim¬ 
ited number of posture and gesture classes need 
to be discriminated. Furthermore, these appli¬ 
cations require a computationally fast method 
of classification. For this, a naive Bayesian clas¬ 
sifier as used in this study. This simple classifi¬ 
cation method can deal with a low-dimensional 
data and requires only a limited amount of 
training data [Ashari et al., 20l3 Gupte et al. 


2014 . For discriminating only a small set of 


classes, low level features describing the coarse 
point cloud distributions and the velocities of 
certain point cloud areas can be used. However, 
to fulfil the implicit statistical assumptions of 
the naive Bayesian classifier, and to identify the 
most relevant application-specific feature sets, a 
pre-processing of features may be beneficial. 

In this pap er, m e thod s of point cloud process¬ 
ing (sec tions 2.1 to 2.3), feature pre-pr oces sing 
(section 2.4) and classification (section [2.5[ ) are 
described. In section 2.6 the training con¬ 
ditions in three different applications - pos¬ 
ture classification for hearing research, multi¬ 
user control of an audio-visual art installation, 
and individualised light control for a dance-floor 
- are explained. Classification performance in 
the different applications with the proposed pre¬ 
processing methods are given in section [3] and 
discussed in section [4] 


2 Methods and apparatus 

For this study, one or more subjects were 
tracked using a Microsoft kinect depth camera. 
Although the final applications of this gesture 
and posture classification approach significantly 
differ, they have all the same structure, which is 
depicted in Fig. |T} First, the camera data was 
filtered for a more robust point cloud estimation 
and background removal. In a second step, the 
point cloud was split into multiple objects. For 
each object, a set of features was extracted, and 
based on this feature set, the posture or gesture 
of each object was classified. The point cloud 
processing and classification was implemented 
in the openMHA hearing device signal process- 


ing platform Herzke et al., 2017 

Grimm et al., 

2009. 

Grimm et al., 2006 . Training and data 


analysis was implemented in Matlab. These 
processing blocks are described below. 


2.1 Noise reduction and background 
removal 

The Microsoft kinect depth camera is an optical 
sensor which measures the depth through the 



Figure 1: Structure of the proposed gesture and 
posture classification framework. 


parallax of an infrared laser grid. It provides 
a depth value d for each pixel position (k,l). 
Invalid values (e.g., occlusion, absorption) get 
the depth value d = 0. In this study, the depth 
was scanned with a frame rate of 10 Hz. 

Absorbing objects and objects with a very 
uneven surface, e.g., curly hair, typically re¬ 
sult in invalid data points for many frames. To 
increase robustness in such conditions, invalid 
values were replaced by the last available valid 
value, if a value was measured within the last 
second. 

For the classification of objects it was es¬ 
sential to separate them from the background. 
Therefore, in an initial phase without subjects 
in the sensing area, the background depth was 
measured, and all depth values close to the 
background were removed. After this step, only 
those data points remain which were assumed 
to belong to a relevant object. 

2.2 Edge detection and object grouping 

An assumption for the object grouping was that 
all objects have a spatial separation, i.e., either 
the depth was not continuous or the objects 
were separated by background pixels. This al¬ 
lows to use a simple first-order gradient edge 
detection algorithm using the depth data. A 
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pixel (k,l) was an object boundary if the depth 
gradient was above a threshold dp. 

( dk,i — dk+i,i) 2 + ( dk,i — dkj+ 1) 2 > d 2 ( 1 ) 


To construct objects, a generic flood fill algo¬ 
rithm |Torbert, 2012 was applied to identify all 
pixels within a closed boundary. These pixels 
were marked on an object map with their ob¬ 
ject number. The set of pixels (k, l ) belonging 
to one object was P, which was then used for 
further object-specific processing if the number 
of elements of P, p, has a sufficient size. 


Coordinate transformation. At this stage 
the objects were defined by a set of pixels with 
a certain depth from the camera. For a robust 
feature extraction, these have to be transformed 
into a world coordinate system. In the first step, 
pixel data was transformed into a camera coor¬ 
dinate system x c = (x c ,y c ,d) T , with the hor¬ 
izontal distance from the camera axis x c , the 
vertical distance y c and the distance from the 
camera d. These coordinates were linearly ap¬ 
proximated by 


(x c ,y c ) = a(k - k 0 ,l-l 0 )d k ,i. ( 2 ) 

(k 0 ,l 0 ) was the central pixel of the camera. 
World-coordinates x = ( x , y, z) T (x distance 
along camera axis, y to the left, z upwards) 
were calculated by rotation and translation of 
the camera-coordinates. These point clouds P 
were the basis of further feature extraction of 
each object. The object centre was x = (x) p , 
i.e., the mean of all points in the point cloud P. 

Temporal alignment of objects. At this 
point, the order of detected objects depends 
on the first object pixel position in the camera 
plane. This is not a robust measure, thus the 
object order may change from frame to frame. 
However, to allow for analysis of time related 
features, the objects were re-ordered based on 
a similarity measure of distance d and the ob¬ 
ject size ratio r between consecutive frames. 
The distance between the objects o and q at 
the time indices t and t — 1 was defined as 
d 0 , q (t ) = ||x 0 (t) — Xq(f — 1)11. The size ratio 
was r 0)q {t) = el ln ( p °( t )) -In ( p? (*- 1 ))l. Then the co¬ 
herence matrix C(f) between two objects was 
defined by its elements 

c 0 , q (t) = r 0 ,q(t)e-^-« (3) 


dered to maximise the elements on the diago¬ 
nal, corresponding to a maximal temporal co¬ 
herence. 

2.3 Feature extraction 

A list of all extracted features and their labels 
can be found in Table [l] Features correspond¬ 
ing to the object in the global coordinate system 
as well as features describing size and distribu¬ 
tion of the point cloud P relative to its centre 
were extracted. The object rotation was esti¬ 
mated from the ratio of depth to width. Two 
methods of calculating point cloud distribution 
were tested: In the first method, weighted av¬ 
erages across P were calculated. For example, 
the average left bottom position was estimated 
by using a weight w with 


w = 


0 - zmax) 2 + (y - {y)) 2 y>(y) 

0 otherwise 


(4) 

To account for dynamic properties, which 
may be important for gesture classification, in 
addition to the above mentioned point cloud 
distribution related features, the absolute value 
of their temporal derivatives was calculated. 

These features define the time-variant feature 
vector f (t) which was used as an input of feature 
pre-processing. 


2.4 Feature pre-processing and 
optimization 

Before the actual classification, the features f 
were pre-processed with a method V to max¬ 
imise the classification performance, 




The pre-processing method V was a combina¬ 
tion of temporal low-pass filtering with the time 
constant r, selection of optimal feature set F, 
and PCA. 

The pre-processing method V was iteratively 
optimised. In each iteration cycle m, the train¬ 
ing of the classifier was done based on the pre- 
processed training data set, whereas the classifi¬ 
cation performance to which this pre-processing 
method V m led, was computed using the test 
data set, pre-processed in the same way as the 
training data. The pre-processing method V m , 
which gave the best classification performance 
was chosen as the final pre-processing method 
for classification. 


with a weighting coefficient 7 = 10. For a re- Orthogonalisation. The naive Bayesian 
sorting of objects, the columns of C were or- classifier used in the current work assumes 
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name 

label 

global coordinates: 


number of pixels p 

n.n 

mean position x 

n.x, n.y, n.z 

median position 

n.xmed, n.ymed, n.zmed 

rotation 

n. rot 

local coordinates: 


size 

n.sx, n.sy, n.sz 

thickness 

n.r 

segment positions 

o.lx o.lz, o.rx, o.rz, o.lby, o.rby 

segment thickness 

n.rl, n.r2, n.r3 

z-quantiles 

n.z25, n.z50, n.z75 

velocities: 


object velocity 

o. vz 

size changes 

o. vsy, o. vsz, o. vsx 

vertical segment velocities 

o.vlz, o.vlz, n.vzl, n.vz2, n.vz3 

horizontal segment velocities 

n.vxyl, n.vxy2, n.vxy3 

angular velocity 

n.vrot 


Table 1: List of features per identified object. The features were calculated by two different 
implementations, as indicated by the prefix o and n. 


conditional independence of all the features. 
This means, that adding features which are 
highly correlated with other features might 
degrade the performance of the classifica¬ 
tion. Therefore, an orthogonalisation of the 
feature space is required. In this study, two 
orthogonalisation methods were tested. 

A principle component analysis (PCA) is a 
generic orthogonalisation method. A transfor¬ 
mation matrix is estimated, which is then ap¬ 
plied to the feature vector before classification. 
To avoid a dominance of large-scale features, all 
features were scaled to ensure a standard devi¬ 
ation of one before calculating the PCA coeffi¬ 
cients. 

As an alternative method, a feature selection 
method is proposed. First, the individual clas¬ 
sification performance of each feature from the 
full feature set was computed, by training the 
classifier only on the given feature. Classifica¬ 
tion performance was then measured on the test 
data set. The features were then sorted by their 
individual classification performance. Starting 
with the best performing feature, features from 
the sorted feature set were added successively 
to the optimal feature set. This procedure was 
repeated until no further increase of classifica¬ 
tion performance was observed. Although this 
feature set is optimised for classification perfor¬ 
mance, it does not guarantee that it is orthog¬ 
onal in a statistical sense. 


Low-pass filtering. Low-pass filtering of the 
features across time results in a smaller feature 
variance within a class and thus a better class 
separation, which as a consequence leads to a 
better classification performance. On the other 
hand, with long time constants the classifier is 
not able to track transitions between the classes. 
The time constant r can be adapted to the ex¬ 
pected frequency of class transitions in the test 
data, or to increase classification performance 
and stability. The optimal r was determined by 
a one-dimensional grid search, with and without 
PCA and feature selection. 

2.5 Classification 

To accomplish the gesture classification task, a 
Gaussian Naive Bayesian Classifier was imple¬ 
mented. This approach assumes a set of condi¬ 
tionally independent and normally distributed 
features. Each class c^, where h = 1,...,N C is 
the class index, and N c is the total number of 
classes, represented a different gesture or pos¬ 
ture. f is a data vector with extracted features 
fj, where j = 1 , is the feature index, and 

Nj is the number of features. Considering the 
independence assumption, Bayes formula can be 
written in the following form: 

, = p(i\c h )p(c h ) = U%P(fj\ch)p(c h ) 

P( f) P( f) 


( 6 ) 
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Figure 2: Labels of the project 1 (research ap¬ 
plication) . 


which means that the overall class conditional 
probability p(t\ch) can be computed by multi¬ 
plying the conditional probabilities for each fea¬ 
ture p(fj\c h ). 

Since the elements of f are assumed to be nor¬ 
mally distributed p{fj\(‘h) = N(pjh, &jh), the 
probability density function (PDF) of a feature 
j for a class h can be modelled by the mean 
p,jh and standard deviation These parame¬ 
ters were estimated from the manually labelled 
training data. Also a flat prior probability was 
assumed, p(ch ) = 1/N C . 

In the current study, probabilities p(ch\t(t)) 
were calculated for each object in each time 
frame. For estimating the classification perfor¬ 
mance, the confusion matrix was computed as 
an average posterior probability for each class. 
The classification performance was the geomet¬ 
ric average across the diagonal of the confusion 
matrix. 

2.6 Classification tasks and class labels. 

The training was executed for three different 
classification tasks, corresponding to the use 
cases in hearing research, art and entertain¬ 
ment. In each training data set, data from nine 
test subjects (age from 23 to 44 years) was used. 
The recording of each gesture or posture lasted 
approximately 90 seconds for each subject. 

In the first task (‘project F), four classes 
with typical communication states were defined 
with an indention to track the subject’s be¬ 
haviour during the hearing experiment. There 
were three sitting postures with labels relaxed, 
straight, forward, and a class corresponding 
to gesticulation while talking, gestures. 

The second task (‘project 2’) consisted of 
eight classes, either body movements or pos¬ 
tures, which were used for controlling and mix¬ 
ing of sound and video art installation con¬ 
cerning different manifestations of water. The 
’water’ classes had the following labels: labels 




Figure 3: Labels of the project 2 (audio-visual 
art installation). 



Figure 4: Labels of the project 3 (dance-floor 
light control). 


lake, rain, ice, waves, ocean, boil, steam 
and thunder. 

The third classification task (‘project 3’) con¬ 
tained five classes related to typical actions on 
a dance-floor at parties, to control the light ac¬ 
cording to individual behaviour of the dancers. 
The labels stand (standing or slowly walking), 
beer (drinking from a bottle), dance (dancing), 
xdance (excessive dancing) and windmill (ro¬ 
tating head) were used. 

Images from Figures mi and i present the 
selection of classes for each project. 
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Figure 5: Classification performance as a function of feature low-pass filter time constant r for 
the orthogonalisation methods ’none’, ’proposed’, ’PCA’ and the combination of ’proposed’ with 
’PCA’, in the three tested projects. The shaded area denotes the chance level. 


3 Results 

3.1 Influence of feature pre-processing 
on classification performance 

Time constant optimisation. Figure [5] 
shows the classification performance as a func¬ 
tion of feature low-pass filter time constant t 
in all tested projects. The optimal value for 
project 1 was 297 ms, resulting in a classifica¬ 
tion performance of 81.8%. In project 2, the 
optimal time constant was 250 ms with a per¬ 
formance of 82.9%. In the third project, the 
time constant t was 8 s, leading to a classifica¬ 
tion performance of 78.4%. 

In all cases, the feature orthogonalisation im¬ 
proved the performance. The maximum per¬ 
formance was always achieved with the pro¬ 
posed method for feature selection. Using 
the PCA alone increased the performance only 
marginally. Both methods in combination do 
not give better performance results than the 
proposed method alone. 

Feature selection. Figure [6] shows the per¬ 
formance of individual features in the three dif¬ 
ferent projects. In project 1, the proposed fea¬ 
ture selection method reduced the dimensional¬ 
ity to 12 features. The performance of individ¬ 
ual features ranged from 19.1% to 44.4%. 42.7% 
of the selected features were velocity-related fea¬ 
tures. In project 2, a set of 17 features was 
found to be optimal; individual performance 
ranged from 13% to 28.4%. 35.3% of the fea¬ 
tures were velocity-related. In the last project, 
only 9 features were sufficient for optimal clas¬ 
sification, with individual performance between 
26.3% and 48.2%. In this case, 66.7% of the 
features were related to motion. 


straight 

relaxed 

forward 

gestures 


project 1 



10 20 30 40 50 60 



stand 
beer 
dance 
xdance 
windmill 

10 20 30 40 50 60 70 


time / s 


project3 



time / s 


Figure 7: Posterior class probability as a func¬ 
tion of time for the test data, with the labelled 
classes indicated by red lines. 

3.2 Classification performance with 
optimised parameter sets 

Figure [7] shows the posterior probabilities as a 
function of time. It can be noticed that classi¬ 
fication errors mostly occurred at class transi¬ 
tions. With the longer time constants of project 
3, a lag of classification at each transition can 
be seen. 

The confusion matrix is shown in Figure |8j 
In project 1, the least confusions were achieved 
for the forward class. Typical confusions were 
between the classes straight and relaxed, as 
well as between gestures and straight. In 
project 2, the least confusions were found for 
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Figure 6: Classification performance of single features (thick bars) and the cumulative classifi¬ 
cation performance (thin bars). Stars denote the features which were selected by the proposed 
orthogonalisation method. Blue colours denote velocity-related features. 
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Figure 8: Confusion matrices (average posterior probability for each class in the test data set) of 
the three different projects. 


the classes boil, ocean and steam. The class 
thunder was often confused with the classes 
rain or steam. In the third project, more con¬ 
fusions can be noticed. Most confusions can be 
found for the classes xdance and windmill, and 
between the classes beer and dance. 

4 Discussion 

The results show that a robust classification of 
gestures and postures based on a low-level fea¬ 
ture set is possible, even with a naive Bayesian 
classifier and a small feature space. The pre¬ 
processing of features indicated that a orthog¬ 
onalisation of the feature space in a statistical 
sense is less important than the selection of fea¬ 


tures with an optimal class separation. How¬ 
ever, it is still unclear whether another order of 
combination of orthogonalisation methods or a 
dimension-reduction in the PCA would further 
increase performance. 


In this study, only number of low-level fea¬ 
tures was used. A high-level feature space, e.g., 
skeleton modelling, might be beneficial for ro¬ 
bust classification of complex and high-level ges¬ 
tures. On the other hand, using such low-level 
features does not require any model assump¬ 
tions. An intermediate solution could be an ad¬ 
vanced segmentation of the point cloud. 
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5 Conclusions 

In this study it was shown that even with 
a small and low-level point-cloud based fea¬ 
ture space a robust classification of gestures 
and postures is possible. The tested applica¬ 
tions covered research, art and entertainment, 
with four to eight classes in each application. 
The proposed method of feature-space opti¬ 
misation by selecting a subset of the features 
was shown to result in better classification per¬ 
formance than a statistical orthogonalisation 
method. Low-pass filtering of features with 
application-specific time constants allowed for 
a trade-off between stable classification and fast 
reactions at class transitions. Classification per¬ 
formance of approximately 80% was achieved in 
all applications. Automatic classification of ges¬ 
tures and postures for hearing research applica¬ 
tions with the ‘subject-in-the-loop’, i.e., with a 
behavioural feedback loop, seems feasible. 
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Abstract 

VoiceOfFaust turns any monophonic sound into a 
synthesizer, preserving the pitch and spectral 
dynamics of the input. 

There are 7 synthesizer and two effect algorithms: 

• a classic channel vocoder 

• a couple of vocoders based on oscillators 
with controllable formants: 

° CZ resonant oscillators 
° PAF oscillators 
° FM oscillators 
° FOF oscillators 

• FM with modulation by the voice 

• ring-modulation 

• Karplus-Strong used as an effect 

• Phase modulation used as an effect 

Keywords 

Synthesis, Signal Processing, Audio Plugins. 

1 Introduction 

VoiceOfFaust turns any monophonic sound into a 
synthesizer, preserving the pitch and spectral 
dynamics of the input. It is written in Faust [1], and 
uses a pitch tracker in Pure Data [2], 

It consists of: 

• an external pitch tracker: helmholtz- [3] by 
Katja Vetter. 

• a compressor/expander, called qompander 
[4], ported to Faust. 

There are 7 synthesizer and two effect algorithms: 

• a classic channel vocoder 

• a couple of vocoders based on oscillators 
with controllable formants: 

° CZ resonant oscillators 
° PAF oscillators 
° FM oscillators 
° FOF oscillators 


• FM with modulation by the voice 

• ring-modulation 

• Karplus-Strong used as an effect 

• Phase modulation used as an effect 

The features include: 

• all oscillators are synchronized to a single 
saw-wave, so they stay in phase, unless you 
don't want them to 

• powerful parameter mapping system lets 
you set different parameter values for each 
band, without having to set them all 
separately 

• formant compression/expansion: Make the 
output spectrum more flat or more resonant, 
at the twist of a knob. 

• flexible in and output routing: change the 
character of the synth. 

• all parameters, including routing, but except 
the octave, are step-less, meaning any 
'preset' can morph into any other. 

• multi-band deEsser and reEsser 

• optionally use as a master-slave pair: 

The master is a saw-oscilator driven by the 
(external) pitchtacker, and the slaves 
contain everything else, synced to the 
master. 

This makes it possible to run the slaves as 
plugins. 

• configuration file: 

Through this file, lot's of options can be set 
at compile time, allowing you to adapt the 
synth to the amount of CPU power and 
screen real-estate available. 

Some of the highlights: 

• number of bands of the vocoders 

• number of output channels 

• whether we want ambisonics output 

• whether a vocoder has one set of 
oscillators, or a separate set of oscillators 
per output. 
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2 Vocoders 

2.1 Common features of all vocoders 

2.1.1 Parameter mapping system 

The parameters for the vocoders use a very 
flexible control system: 

Each parameter has a bottom and a top knob, 
where the bottom changes the value at the lowest 
formant band, and the top the value at the highest 
formant band. 

The rest of the formant bands get values that are 
evenly spaced in between. 

For some of them that means linear spacing, for 
others logarithmic spacing. 

For even more flexibility there is a parametric 
mid: 

You set it's value and band number and the 
parameter values are now: 

• 'bottom' at the lowest band, going to: 

• ’mid value' at band nr 'mid band', going to: 

• 'top value' at the highest band. 

Kind of like a parametric mid in equalizers. 

If that's all a bit too much, just set "para” to 0 
in the configuration file, and you'll have just the 
top and bottom settings. 

2.1.2 Formant compression/expansion 

Scale the volume of each band relative to the 
others: 

• 0 = all bands at average volume 

• 1 = normal 

• 2 = expansion 

expansion here means: 

• the loudest band stays the same 

• soft bands get softer 

Because low frequencies contain more energy than 
high ones, a lot of expansion will make your 
sound duller. 

To counteract that, you can apply a weighting 
filter, settable from 

• 0 = no weighting 

• 1 = A-weighting 

• 2 = ITU-R 468 weighting 

2.1.3 DeEsser 

To tame harsh esses, especially when using some 
formant compression/expansion, there is a 
deEsser: 


It has all the usual controls, but since we already 
are working with signals that are split up in bands, 
with known volumes, 
it was implemented rather differently: 

• multiband, yet much cheaper, 

• without additional filters, even for the 
sidechain, 

• and with a dB per octave knob for the 
sidechain, from OdB/oct (bypass), to 
60dB/oct (fully ignore the lows). 

It also has a (badly named) noise strenght 
parameter: it uses the fidelity parameter from the 
external pitchtracker to judge if a sound is an S. 
When you turn it up, the deEsser gets disabled 
when the pitchtracker claims a sound is pitched. 
See [3] for more info. 

2.1.4 ReEsser 

Disabled by default, but can be enabled in the 
configuration file. 

It replaces or augments the reduced highs caused 
by the deEsser. 

2.1.5 DoubleOscs 

This is a compile option, with two settings: 

• 0 = have one oscillators for each formant 
frequency 

• 1 = creates a separate set of oscillators for 
each output channel, with their phase 
modulations reversed. 

2.1.6 In and output routing 

The vocoders can mix their bands together in 
various ways: 

We can send all the low bands left and the high 
ones right, we can alternate the bands between left 
and right, we can do various mid-side variations 
we can even do a full Hadamard matrix. 

All of these, and more, can be cross-faded 
between. 

In the classicVocoder, a similar routing matrix sits 
between the oscillators and the filters. 

2.1.7 Phase parameters 

Since all 1 formants are made by separate 
oscillators that are synced to a single master 
oscillator, you can set their phases relative to each 
other. 

This allows them to sound like one oscillator 
when they have static phase relationships, and to 
sound like many detuned oscillators when their 
phases are moving. 


1 except for the classicVocoder. 
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Together with the output routing, it can also create 
interesting cancellation effects. 

For example, with the default settings for the 
FMvocoder, the formants are one octave up from 
where you'd expect them to be. 

When you change the phase or the output routing, 
they drop down. 

These settings are available: 

• static phases 

• amount of modulation by low pass filtered 
noise 

• the cutoff frequency of the noise filters 

2.2 Features of individual vocoders 

2.2.1 Classic Vocoder 

Block-diagram: 

https://magnetophon.github.io/VoiceOfFaust/imag 

es/classicVocoder-svg/process.svg 

A classic channel vocoder, with: 

• a "super-saw" that can be cross-faded to a 
“super-pulse", free after Adam Szabo [5]. 

* flexible Q and frequency setting for the filters 

• an elaborate feedback and distortion matrix 
around the filters 

The gui of the classicVocoder has two sections: 
First oscillators, containing the parameters for the 
carrier oscillators. 

These are regular virtual analog oscillators, with 
the following parameters: 

• cross-fade between oscillators and noise 

• cross-fade between sawtooth and pulse 
wave 

• width of the pulse wave 

• mix between a single oscilators and 
multiple detuned ones 

• detuning amount 

Second filters, containing the parameters for the 
synthesis filters: 

• bottom, mid and top set the resonant 
frequencies 

• Q for bandwidth 

• a feedback matrix, each filter gets fed back 
a variable amount of: 

° itself 

° it's higher neighbor 
° it's lower neighbor 
° all other filters 
° distortion amount 
° DC offset 


2.2.2 CZvocoder 

Block-diagram: 

https://magnetophon.github.io/VoiceOfFaust/imag 
es/cz Vocoder-svg/process. svg 

This is the simplest of the vocoders made out of 
formant oscilators. 

The oscillators where ported from a pd patch by 
Mike Moser-Booth [6], 

You can adjust: 

• the formant frequencies 

• the phase parameters 

2.2.3 PAFvocoder 

Block-diagram: 

https://magnetophon.github.io/VoiceOfFaust/imag 

es/PAFvocoder-svg/process.svg 

The oscillators where ported from a pd patch by 
Miller Puckette [7], 

It also has frequencies and phases, but adds index 
for brightness. 

2.2.4 FMvocoder 

Block-diagram: 

https://magnetophon.github.io/VoiceOfFaust/imag 

es/FMvocoder-svg/process.svg 

The oscillators where based on code by Chris 
Chafe [8]. 

Same parameters, different sound. 

2.2.5 FOFvocoder 

Block-diagram: 

https://magnetophon.github.io/VoiceOfFaust/imag 

es/FOFvocoder-svg/process.svg 

Original idea by Xavier Rodet [9]. 

based on code by Michael Jorgen Olsen [10]. 

Also has frequencies and phases, but adds: 

• skirt and decay: 

Two settings that influence the brightness 
of each band 

• Octavation index 

Normally zero. If greater than zero, 
lowers the effective frequency by 
attenuating odd-numbered sinebursts. 
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Whole numbers are full octaves, fractions 
transitional. 

Inspired by an algorithm in Csound [11]. 

3 Other synthesizers 

These are all synths that are not based on 
vocoders. 

3.1 Features of individual synths 

3.1.1 FMsinger 

Block-diagram: 

https://magnetophon.github.io/VoiceOfFaust/imag 

es/FMsinger-svg/process.svg 

A sine wave that modulates its frequency with the 
input signal. 

There are five of these, one per octave, and each 
one has: 

• volume 

• modulation index 

• modulation dynamics 

This fades between 3 settings: 

° no dynamics: the amount of 
modulation stays constant with 
varying input signal 

° normal dynamics: more input volume 
equals more modulation 
° inverted dynamics: more input equals 
less modulation. 

3.1.2 CZringmod 

Block-diagram: 

https://magnetophon.github.io/VoiceOfFaust/imag 

es/CZringmod-svg/process.svg 


Ringmodulates the input audio with emulations of 
Casio CZ oscillators. 

Again five octaves, with each octave containing 
three different oscillators: 

• square and pulse, each having volume and 
index (brightness) controls 

• reso, having a volume and a resonance 
multiplier: 

This is a formant oscillator, and it's 
resonant frequency is multiplied by the 
formant setting top right. 

It is intended to be used with an external 
formant tracker. 

• There is a global width parameter that 
controls a delay on the oscillators for one 
output. 


The delay time is relative to the 
frequency. 

Because this delay is applied to just the 
oscillators, and before the 
ringmodulation, the sound of both output 

channels arrives simultaneously. 

This creates a mono-compatible widening 
of the stereo image. 

3.1.3 KarplusStrongSinger 

Block-diagram: 

https://magnetophon.github.io/VoiceOfFaust/imag 

es/KarplusStrongSinger-svg/process.svg 

This takes the idea of a Karplus Strong algorithm 
[12], but instead of noise, it uses the input signal. 
The feedback is ran trough an allpass filter, 
modulated with an LFO; adapted from the 
nonLinearModulator in instrument.lib. 

To keep the level from going out of control, there 
is a limiter in the feedback path. 

Parallel to the delay is a separate 
nonLinearModulator. 

Globally you can set: 

• octave 

• output volume 

• threshold of the limiter 
For the allpass filters you can set: 

• amount of phase shift 

• difference in phase shift between left and 
right (yeah, I lied, there are two of 
everything) 

• amount of modulation by the LFO 

• frequency of the LFO, relative to the main 
pitch 

• phase offset between the left and right 
LFO's. 

To round things off there is a volume for the dry 
path and a feedback amount for the delayed one. 

3.1.4 KarplusStrongSingerMaxi 

Block-diagram: 

https://magnetophon.github.io/VoiceOfFaust/imag 

es/KarplusStrongSingerMaxi-svg/process.svg 


To have more voice control of the spectrum, this 
one has a kind of vocoder in the feedback path. 
Since we don't want the average volume of the 
feedback path changing much, only the volumes 
relative to the other bands, the vocoder is made 
out of equalizers, not bandpass filters. 

You can adjust it's 

• strength: from bypass to 'fully equalized' 
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• cut/boost; steplessly vary between 

° -1 = all bands have negative gain, 

except the strongest, which is at 0 
° 0 = the average gain of the bands is 0. 

° +1 = the all bands have positive gain, 

except the weakest, which is at 0 

• top and bottom frequencies 

• Q factor 

4 Master-slave 

This is a workaround for the need for an 
external pitchtracker, making it possible to use the 
synths and effects as plugins. 

It has the nice side effect that your sounds become 
fully deterministic: 

because a pitchtracker will always output slightly 
different data, or at least at slightly different 
moments relative to the audio, the output audio 
can sometimes change quite a bit from run to run. 
The master is a small program that receives the 
audio and the OSC messages from the external 
pitch tracker, and outputs: 

• a copy of the input audio 

• a saw wave defining the pitch and phase 

• the value of fidelity, from the pitch 
tracker, as audio. 

The slaves are synths and effects that input the 
above three signals. 

The outputs of the master can be recorded into a 
looper or DAW, and be used as song building 
blocks, without needing the pitch tracker. 

This makes it possible to switch synths, automate 
parameters, etc. 

5 Strengths and weaknessesses of Faust 

The Faust language has some big advantages. 
The common perks of the language apply. For me, 
the biggest ones are: 

• Quick implementation of ideas. 

• If it sounds right, it is right. There won’t 
be any crashes, memory leaks or other 
bugs. 

• Write once, deploy everywhere. 

• The block diagrams help with debugging 
and documentation. 

• Fast running code. 

• Automatic parrallelisation. 

In this project it was also very helpful to be able 
to easily parameterize things like the number of 
bands. Related: the input and output routing 
wouldn’t be nearly as easy and fun to implement 


in most languages, as they lean heavily on Fausts 
splitting and combinatory operators. 

Since this idea has been implemented in 
PureData earlier, it makes sense to mention two 
big advantages over that: 

1. Text-interface, enabling quicker notation 
of ideas, version-control and a mouseless 
workflow. 

2. Single sample feedback loops, as used in 
the classicVocoder. 

The downsiders of Faust to me are a steep 
learning curve and error messages that are often 
very verbose and unclear. 


6 Use cases 

The author has used VoiceOfFaust mostly for 
voice transformation in a musical context, but it 
has also come in handy to turn a bass-guitar into a 
synth [14]. 

7 Deployment 

VoiceOfFaust heavily leans on knowing the 
pitch of the input signal. Since it’s not yet 
possible to do decent pitchtracking in Faust, an 
external pitchtracker which sends the pitch trough 
OSC is used. 

This limits the usable architectures to the ones 
supporting OSC. 

Specifically, it would be nice to have 
VoiceOfFaust as a plugin within a DAW, but that 
is not directly possible. 

The master slave architecture is a usable 
workaround. 

To compile VoiceOfFaust, run one of the 
compilation scripts that support OSC, for 
example: 

faust2jack -osc FMvocoder.dsp 

To run it, you can use one of the scripts in the 
launchers directory, for example: 

,/FMvocoder_PT 

This will start puredata with the pitchtracker 
patch plus a synth, and connect everything trough 
jack. 
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Abstract 

This paper brings together some ideas regarding 
computer music instrument development with re¬ 
spect to the C++ language. It looks at these from 
two perspectives, that of the development of self- 
contained instruments with the use of a class library 
and that of programming of plugin modules for a 
music programming system. Working code exam¬ 
ples illustrate the paper throughout. 

Keywords 

Computer Music Instruments, C++, Music Pro¬ 
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1 Introduction 

Whatever we do, if we are in the business of 
making music solely or primarily with comput¬ 
ers, one way or another, at some point, we will 
meet computer music instruments[Lazzarini, 
2017a] . Whether we are making electroacous¬ 
tic music, algorithmic composition, live coding, 
tracking, creating pop tunes, we will find our¬ 
selves manipulating these. They can present 
themselves through music programming sys¬ 
tems [Lazzarini, 2013] , such as Csound [Laz¬ 
zarini et ah, 2016] or Faust [Orlarey et al., 2004], 
or as software synthesizers, plugins, audio pro¬ 
cessing programs, etc. There is a wide variety 
of forms. In this paper, I would like to contem¬ 
plate one of these that involves libraries, com¬ 
pilers, and the C++ language. 

C++ was once described as having “the ele¬ 
gance, the power, and the simplicity of a hand 
grenade”, which to me, as a die-hard pure 
C programmer sounds about right. However, 
I must admit that its latest standards, ISO 
C++ll[ISO/IEC, 2011] , C++14[ISO/IEC, 
2014] , and the forthcoming C++17,[ISO/IEC, 
2017] arriving in quick succession as they are, 
are making this monstrous language more at¬ 
tractive. Now finally we can write a nice lambda 
and pass it to a map to process a list, for 
example. The standard library, borne out of 


the much appreciated, much maligned, stan¬ 
dard template library (STL), has actually be¬ 
come quite usable. There is still enough com¬ 
plexity for one to get entangled, however, but 
with moderation and good design, we can make 
it work for us. 

This paper will examine two approaches of 
C++ instrument making. The first one is based 
on employing a signal processing library to write 
simple, straightforward, programs that can be 
ported to various platforms. The second is to 
create components, plugins, for Csound using 
a framework that sits atop the system imple¬ 
mentation in C. It is mostly directed at com¬ 
puter music practitioners who can converse in 
C/C++, and it will be fully illustrated by work¬ 
ing code, which can also be found somewhere in 
an online repo (links will be given). 

2 AuLib Instruments 

Towards the end of 2016, I decided to collect 
a number of digital signal processing (DSP) al¬ 
gorithms that I had been writing or studying 
throughout the years into a simple, lightweight, 
flexible C++ library, called AuLib 1 . One of my 
aims was to document these uniformly in effi¬ 
cient and readable code so that they could be¬ 
come somewhat of a reference for me and others. 
I was also rewriting some of my teaching mate¬ 
rials and this became part of them. Following 
a number of refactoring steps, I settled on a de¬ 
sign that followed modern C++ standards in 
employing the standard library as much as pos¬ 
sible to handle resources and keeping the code 
as simple and lightweight as possible. 

When designing a class library, there are two 
distinct possibilities (amongst the various deci¬ 
sions we have to make) with respect to object hi¬ 
erarchies. One is that we can define a base class 
for DSP objects that has a shared processing in¬ 
terface, that is one (or more) DSP methods that 

x github.com/vlazzarini/aulib/ 
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are specialised in derived classes. A means of 
connecting objects is provided separately from 
this, and once connected, we can place the ob¬ 
jects in a list of references or pointers to the base 
class and call the processing method of each in 
turn to get an output signal. This is what is at 
the heart of processing engines such as the one 
in PD, Csound, Faust. If the aim is to create a 
library whose main objective is to be employed 
as an engine for some higher-level programming 
or patching system, this is the way to go. The 
Sound Object Library[Lazzarini, 2000] was de¬ 
signed this way and it really paid off when it 
was later wrapped up in Python. 

The alternative is to relax this constraint and 
not provide a unified processing interface, leave 
it to derived classes to define their own. The 
advantage of this is that each class can have dif¬ 
ferent ways to handle input parameters to pro¬ 
cessing methods, depending on what they are 
supposed to do. So an oscillator might have am¬ 
plitude and frequency as parameters, in scalar 
or vectorial forms, or no parameters at all (for 
say fixed values of amplitude and frequency). 
It can provide a bunch of overloads to han¬ 
dle each case. A filter will have an input sig¬ 
nal and optional parameters, depending on the 
type. A frequency-domain object might take a 
spectral frame. This, on one hand, simplifies 
connections (we can define them at the process¬ 
ing point, rather than separately), and on the 
other makes it hard to use in sound engine appli¬ 
cations where the interface needs to be shared. 

Given that the objective here for this library 
was to provide a working context for a di¬ 
verse set of algorithms, and to provide a flex¬ 
ible means of using them in programs, I have 
opted for the second approach. This would 
provide greater freedom to create exactly the 
right form to hold each DSP formulation. Now, 
given this context, it is still desirable to use the 
class structure afforded by C++ to re-use code 
fully. This meant to design a base class that 
was a container for an audio signal, providing 
the typical fundamental operations we would 
like to perform on it. For me, this meant: scal¬ 
ing (multiplying by a scalar), offsetting (adding 
a scalar), mixing (adding a vector/signal) and 
ring modulating (multiplying by a vector/sig- 
nal). Granted, in an audience of music and 
audio developers, we are likely to find multiple 
definitions of what fundamental operations on 
signals are, but I am drawing the line here (ok 
maybe not quite, but let’s keep at this for the 


moment). Attributes such as number of chan¬ 
nels (interleaved), sampling rate and vector size 
are also needed, and of course the audio signal 
vector itself. 

This makes up the AudioBase class of the li¬ 
brary, which begins like this: 

class AudioBase { 
protected: 

uint32_t m_nchnls ; 

uint32_t m.vframes; 

std::vector<double> m_vector; 

double m_sr ; 

uint32_t m.error; 

Having a fundamentally neutral base, with no 
hint of what a DSP object might want to im¬ 
plement allows me to use it for absolutely ev¬ 
erything I can think of, or almost. So of the 50- 
odd classes currently sitting in the library, only 
four are not derived from AudioBase (fig. 1). 
It is specialised for common time-domain oper¬ 
ations (oscillators, filters, envelopes, etc.), for 
spectral processing (short-time Fourier trans¬ 
form, phase vocoder), for function tables, for 
audio input/output, and even for higher-level 
instrument models. Code re-use is truly max¬ 
imised. 

3 Developing Instruments 

A detailed description of the library design is 
offered elsewhere [Lazzarini, 2017c]. In this pa¬ 
per, we will to look at using it for C++ instru¬ 
ment development. So let’s explore some cases 2 . 

3.1 Basic examples 

We begin with a trivial case: a ping instrument, 
written as a command-line program. This just 
plays a 440Hz, -6dB sine wave to the output for 
a couple of seconds. The code, without its safety 
checks etc, can be abbreviated as this seven- 
liner: 

int main () { 

Oscil sig(0.5,440); 

SoundOut output("dac"); 
for (int i = 0; i < def_sr * 2; 
i += def_vframes) 
output.write(sig.process ()); 
return 0; 

} 

Frequency and amplitude are not changing, 
so I pick the process () overload with no pa¬ 
rameters and stick its return value straight into 
the output write () method. The two classes 

2 all examples available in the examples directory of 
the AuLib repository. 
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AuLib::MidiData 


AuLib:: Midiln 


j AuLib::Segroents~| 
| AuLib::TabteSet [ 


Figure 1: The AuLib class library 


share the base, but they have distinctly-named 
and defined processing methods. 

Let’s try something slightly less simplistic. 
A similarly-placed instrument but now with a 
sweeping resonant filter acting on a sawtooth 
wave: 

int mainO { 

TableSet saw(SAW); 

BlOsc sig (0.5 , 440., saw); 

ResonR fil (1000 , 1 . ) ; 

Balance bal; 

SoundOut output("dac"); 


for (int i = 0; i < def_sr*10; 
i += def _vf rame s ) { 

sig.process () ; 
fil.process (sig , 

1000 . + 400 . * i / def _sr) ; 

bal.process(fil , sig); 
output . write (bal) ; 

} 

return 0; 

} 

TableSet creates a set of tables for a band- 
limited oscillator. The filter centre frequency is 
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varied over time, and we feed its output into a 
balancing operator that uses a comparator to 
keep amplitudes under control. 

To demonstrate how the base class-defined 
operations can be useful, we have a simple FM 
example 

int mainO { 

double fm = 440. , fc = 220. , ndx = 5. ; 
Oscili mod, car ; 

SoundOut output("dac"); 
for (int i = 0; i < def_sr*10; 
i += def_vframes) { 
mod . process (ndx * fm , fm); 
car.process (0.5, mod += fc); 
output.write(car); 

} 

return 0; 

} 

Note the use of the overloaded sum- 
assignment operator in mod += fc to add the 
modulator signal to the carrier (scalar) fre¬ 
quency. 

3.2 Instrument Models 

Clearly, the examples above are more demon¬ 
strations of how instruments can be set up. This 
would, in a more realistic scenario, be placed 
in plugin or GUI application wrapping code, 
where they can become useful. The AuLib also 
provides some modelling of instruments and in¬ 
stances of these. We can show how these work 
in a straightforward application case: a poly¬ 
phonic MIDI synthesiser. 

The AuLib class Note provides the base for 
an instance of a sound object, which can be 
for example, a synthesiser voice. This holds 
basic parameters such as amplitude, cps pitch, 
etc. that we can use to control a sound object. 
To use it, we derive our own, and specialise its 
dsp method, placing our sound processing code 
there. 

class SineSyn : public Note { 

// signal processing objects 
Adsr m_env ; 

Oscili m_osc ; 

// DSP override 

virtual const SineSyn &dsp() { 
if (!m_env.is_fini shed ()) 

set(m_osc(m_env(), m_cps)); 
else clear (); 
return *this; 

} 

The sound synthesis is again, trivial, to 
keep the example focused: an envelope and a 
sine wave oscillator. But note that we have 


a new convenient interface: using the classes 
operator (), we connect objects more easily one 
into another. This syntax reinforces the connec¬ 
tion metaphor, envelope, alongside pitch, into 
oscillator. Given that the class is derived from 
AudioBase, we set its vector to the result of the 
processing. 

Additionally, we want to specialise two other 
methods: for sound onset and sound termina¬ 
tion: 

// note off processing 
virtual void off.note () { 

m.env.release (); 

> 

// note on processing 
virtual void on.note() { 

m.env.reset(m.amp, 0.01, 

0.5, 0.25 * m.amp, 0.01); 

I 

This plus the constructor completes our Note- 
derived class. Now we want to model the whole 
synthesiser, not just its voices. To do this, we 
can use the Instrument template class, instan¬ 
tiated with the required number of voices and 
our note class: 

Instrument<SineSyn> synth(8); 

An important aspect of this class is that it 
has a dispatchO method that takes in five pa¬ 
rameters (message type, channel, datal, data2, 
time stamp) and responds to two message types 
(NOTE ON, NOTE OFF). While these are the 
same as the MIDI channel messages, we are 
just re-using the metaphor here. The call to 
dispatchO does not need to originate from 
MIDI or be limited to the usual MIDI data 
ranges. Specialisations of instrument can re¬ 
implement message handling to allow for other 
types. Instrument also handles polyphony us¬ 
ing last-note priority, and this can also be over¬ 
riden in derived classes. 

Given that the example will use MIDI input, 
the library supports a simple MIDI listener class 
that takes an Instrument object (or from any 
type implementing dispatchO and process() 
and responds to messages. The complete pro¬ 
gram becomes very straightforward (trivial sig¬ 
nal handler implementation omitted): 

int main () { 

int dev ; 

Instrument<SineSyn> synth (8); 

SoundOut out("dac"); 

Midiin midi; 

std: : signal(SIGINT , signal.handler) ; 
std::cout << 
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"Available MIDI inputs:\n"; 
for(auto tdevs: 

midi.device_list ()) 
std::cout << devs << std::endl; 
std::cout << "choose a device: 
std::cin >> dev; 

if (midi.open(dev) == 

AULIB_N0ERR0R) { 
std::cout << 

"running . . . 

(use ctrl-c to close)\n"; 
while (running) 

// listen to midi on 

// behalf of synth 

out(midi.listen(synth)); 

} else 

std::cout << 

"error opening device...\n"; 

std::cout << "...finished \n"; 

return 0; 

} 

Again, with a few lines of code, we can get 
a basic MIDI synthesiser instrument. Although 
the synthesis is simple, it can be shown that 
the effort involved in more complex examples 
scales well. It is just a case of using other signal 
processing objects in different arrangements. 

4 Csound Plugins 

The second case of C++ instrument develop¬ 
ment we will look at focuses on creating com¬ 
ponents (plugins) that can be employed in a mu¬ 
sic programming language. Unit generators in 
Csound are known as opcodes and the system 
has a well-document C interface for the pur¬ 
pose of adding new ones of these to it. It also 
has a C++ base class that has been used for 
a small number of opcode plugin libraries that 
come with the system. 

With the intention of enabling a more com¬ 
plete and well-integrated C++ support for plu¬ 
gin opcode development, I have introduced 
the Csound Plugin Opcode Framework 3 CPOF 
(pronounced see-pough or cipd = vine in Por¬ 
tuguese 4 ). The actual framework part of it 
is fairly light, consisting of two template base 
classes, but it also contains an extensive set 
of utility classes that wrap Csound C code for 
C++ use in a very idiomatic way (table 1). 
CPOF is discussed extensively in [Lazzarini, 
2017b], 

3 available as part of Csound, github.com/csound/ 
csound, with code examples in the examples/plugin di¬ 
rectory. 

4 as in: C++ gives you enough vine, or rope, for you 
to either hoist yourself up a tree, or hang yourself fairly 
decently. 


Class 

Description 

Csound 

The Csound engine 

Params 

Opcode parameters 

AudioSig 

Audio signals 

Fsig 

Spectral signals 

Pvframe<T> 

Spectral data frames 

Pvbin<T> 

Spectral data bins 

Vector<T> 

Array variables 

Table 

Function tables 

AuxMem<T> 

Dynamic memory 

Thread 

Multithreading 

Plugin<N,M> 

Plugin base class 

FPlugin<N,M> 

Spectral plugin base class 


Table 1: Classes provided by CPOF. 


5 Plugin Examples 

The Csound language has a variety of internal 
data types that its opcodes can process. We will 
look at each one of these with a programming 
example. 

5.1 Init-time opcodes 

In Csound, code that is run only once per in¬ 
stantiation (or again on explicit re-initisation) 
employs init-time variables. These are scalar 
types holding a floating-point number (the 
MYFLT type defined by the system). Plugin op¬ 
codes for these types are derived from Plugin 
and are instantiated templates taking the num¬ 
ber of output and input arguments (respec¬ 
tively) as parameters. The following examples 
uses the standard library Gaussian generator to 
produce a random number using the normal dis¬ 
tribution. The first input argument is the mean, 
followed by the deviation, and the seed: 

#include <plugin.h> 
struct Gauss : 

csnd ::Plugin<1, 3>{ 

std::normal_distribution<MYFLT> norm; 
std::mtl9937 gen; 

init init (){ 

csnd : : constr(&norm, inargs [0] , 
inargs [1] ) ; 

csnd : : constr(&gen , inargs [2] ) ; 
outargs [0] = norm(gen); 

csnd::destr(fenorm); 
csnd::destr(&gen); 

> 

}; 

Note that because Csound instantiates the 
plugin object and it does not know anything 
about C++ constructors, we need to explicitly 
construct the objects norm and gen. When we 
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are done, we need to destruct them as they are 
likely to have allocated resources, which we do 
not want to be left dangling. The Plugin base 
class gives us the inargs and outargs objects, 
which contain the input and output arguments 
respectively. 

In order for the plugin to be added to 
Csound’s collection of opcodes, we need to 
register it. To do this, we implement the 
csnd: :on_load() function, where we place 
a call to the csnd: :plugin<T>() template 
method, passing the argument types ("i") and 
the action time of the opcode (thread: :i), as 
well as the opcode name we will use (”guas- 
sian”): 

#include <modload.h> 
void csnd : :on_load(Csound 
*csound) { 

csnd: :plugin<Gauss >(csound, "gaussian", 
"i", "iii", csnd::thread:: i) ; 

} 

5.2 Control-rate opcodes 

The next data type we can tackle is the one used 
control-rate variables (k). This is also a scalar, 
but now the opcode is active at performance 
time (as well as init). A control-rate version of 
the gaussian opcode would look like this: 

struct GaussP : 
csnd ::Plugin<1, 3>{ 

std::normal_distribution<MYFLT> 
norm ; 

std::mtl9937 gen; 
int init ( ){ 

csnd: : constr(fcnorm , inargs[0] , 
inargs [1]); 

csnd: :constr (&gen, inargs [2]) ; 
csound->plugin_deinit(this); 
return OK ; 

} 

int deinit (){ 
csnd: :destr(Snorm) ; 
csnd::destr(&gen); 
return OK ; 

} 

int kperf () { 

outargs[0] = norm(gen); 

return OK; 

} 

>; 

We can see that we now supplied the kperf () 
that will be called repeatedly during perfor¬ 
mance. Another difference is that we have 
to provide a deinit() to call the destructors, 
which will be called when performance ends. 
This method needs to be registered separately 


with Csound through the plugin.deinit() 
template function. We register this version of 
the opcode with: 

csnd::plugin<GaussP>(csound,"gaussian", 
"k", "iii", csnd::thread::ik); 

5.3 Audio-rate opcodes 

For audio signals, we need to implement the 
aperf () method. The variable now is a vector, 
so we have to use an AudioSig object to hold 
it. The following example shows an aperf () 
method that can be added to GaussianPerf to 
implement an audio rate opcode: 

int aperf (){ 

csnd::AudioSig out (this, outargs (0)); 
for(auto ^sample : out) 
sample = norm(gen); 
return OK; 

1 

The same class can then be registered for an 
audio-rate output: 

csnd: :plugin<GaussP>(csound, "gaussian" , 
"a", "iii", csnd::thread::ia) ; 

5.4 Spectral signals 

Spectral signals in Csound are carried from op¬ 
code to opcode using fsig variables. These are 
self-describing variables holding one frame of 
frequency-domain data, plus associated infor¬ 
mation about the stream. In CPOF, we manip¬ 
ulate these using the pv_stream class. Similarly 
to audio signals we can get the fsig data off argu¬ 
ments into objects of these types for processing. 
An opcode is responsible for initialising its own 
output stream, which we can do at init time. 
Stream frames can be decomposed in separate 
bins held by pv_bin objects. 

The example below shows a plugin that im¬ 
plements spectral tracing [Wishart, 1996] de¬ 
fined as retaining only the loudest N bins in 
each frame. Some important aspects to note 
about this code: (a) spectral processing occurs 
at a rate determined by the frame analysis rate, 
so we run it a k-rate and process frames as they 
become available; (b) a framecount, a member 
variable of the FPlugin base class, is kept for 
this, (c) The AuxMem is used to manage a heap- 
allocated block of memory to keep bin ampli¬ 
tudes; and (d) we add the types as a static con¬ 
stant member of the class, which simplifies the 
plugin registration call. 

The basic algorithm is as follows: 

1 . get the amplitudes from each bin; 
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2 . find the nth loudest; 

3. use this as a threshold to filter the frame 
date, keeping only the bin holding ampli¬ 
tudes above it. 

#include <plugin.h> 

#include <algorithm> 

struct PVTrace : csnd : :FPlugin<1 , 2> { 

csnd::AuxMem<float> amps; 
static constexpr 
char const *otypes = "f"; 
static constexpr 
char const *itypes = "fk"; 

int init () { 

if(inargs.fsig_data (0) .isSliding ()) 
return csound->init.error( 

Str("sliding not supported")); 

if(inargs.fsig_data (0) .fsig_format () 

!=csnd::fsig_format::pvs && 
inargs.fsig_data (0) .fsig_format () 
!=csnd: :fsig_format : :polar) 
return csound->init_error( 

Str("fsig format not supported")); 

amps.allocate(csound, 

inargs.fsig_data(0) .nbins ()) ; 

csnd::Fsig &fout = 
outargs.fsig_data(0); 
fout.init ( csound , 
inargs.fsig_data (0)) ; 

framecount = 0; 
return OK; 

> 

int kperf () { 

csnd::pv_frame &fin = 
inargs.fsig_data (0); 
csnd::pv_frame ftfout = 
outargs.fsig_data(0); 

if(framecount < fin.count ()) { 

int n = fin.lenQ - (int) inargs [1] ; 

float thrsh; 

std::transform(fin.begin(),fin.end(), 
amps.begin () , [] (csnd : :pv_bin f){ 

return f.amp (); }) ; 

std : : nth_element(amps.begin () , 
amps.begin()+n, amps.end ()); 
thrsh = amps [n] ; 

std: :transform(fin.begin() , fin.end() , 
fout.begin ( ) , 

[thrsh] (csnd::pv_bin f ) { 
return f.amp () >= thrsh ? 

f : csnd : :pv_bin ( ) ; }); 

framecount = fout.count(fin.count ()); 

} 


return OK; 

} 

}; 

#include <modload.h> 

void csnd::on_load(Csound *csound) { 
csnd: :plugin<PVTrace >( csound , 

"pvstrace", csnd::thread::ik); 

> 

The standard library algorithms are very well 
suited to implementing these steps. The code 
becomes very compact and fairly readable. 

5.5 Array variables 

Csound has a container type, array, which can 
be used to create vectors of built in types. 
CPOF provides a template class Vector<T> 
to wrap array arguments conveniently for ma¬ 
nipulation. The typedef myflt.vector is an 
instantiation of this template for real values 
(MYFLT). The following example combines the 
use of lambdas and templates to create a whole 
family of binary (two-operand) operators for nu¬ 
meric (scalar) arrays. It can be used for init and 
k-rate opcodes. The processing is placed on a 
separate function to avoid code duplication. It 
is just a matter of mapping the inputs into the 
outputs through the application of a given func¬ 
tion. 

template <MYFLT (*bop)(MYFLT , MYFLT)> 
struct Array0p2 : csnd ::Plugin<1 , 2> { 

int process ( csnd :: myfltvec Jiout , 

csnd::myfltvec &inl, 
csnd::myfltvec &in2) { 

std::transform(ini.begin(), ini.end(), 
in2.begin(), out .begin () , 

[](MYFLT f1, MYFLT f2) { 
return bop(fl, f2); }); 

return OK; 

> 

int init () { 

csnd::myfltvec &out = 
outargs.myfltvec_data (0) ; 
csnd::myfltvec &inl = 
inargs.myfltvec_data(0); 
csnd::myfltvec &in2 = 
inargs.myfltvec_data(l); 

if (in2.1en() < inl.lenQ) 

return csound->init.error( 
StrC'second input array" 

" is too short\n")); 

out . init ( csound , inl.lenQ); 
return process (out, ini, in2); 

> 

int kperf () { 

return 
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process(outargs.myfltvec_data(0), 
inargs.myfltvec_data (0) , 
inargs.myfltvec_data (1)) ; 

} 

}; 

This class template then is instantiated to 
create various opcodes based on different two- 
operand functions: 

csnd::plugin<Array0p2<std::atan2>> 
(csound, "taninv", 

" i [] " , " i [] i [] " , csnd : : tlread : : i) ; 

csnd::plugin<Array0p2<std::atan2>> 
(csound, "taninv", 

"k[]", "k[]k[]", csnd :: thread :: ik ) ; 

csnd::plugin<Array0p2<std::pow>> 

(csound , "pow " , 

"i[]", " i [] i [] " , csnd : : tiread : : i) ; 

csnd::plugin<Array0p2<std::pow>> 

(csound , "pow " , 

"k[]", "k[]k[]", csnd :: thread :: ik ) ; 

csnd::plugin<Array0p2<std::hypot>> 
(csound, "hypot", 

" i [] " , " i [] i [] " , csnd : : thread : : i ) ; 

csnd::plugin<Array0p2<std::hypot>> 
(csound, "hypot", 

" k [] " , "k[]k[]", c snd :: thread :: ik ) ; 

csnd::plugin<Array0p2<std::fmod>> 

(csound , "fmod " , 

"i[]", " i [] i [] " , csnd : : thread : : i) ; 

csnd::plugin<Array0p2<std::fmod>> 

(csound , "fmod " , 

"k[]", "k[]k[]", csnd :: thread :: ik ) ; 

csnd::plugin<Array0p2<std::fmax>> 

( csound , "fmax" , 

" i [] " , " i Cl i [] " , csnd : : thread : : i) ; 

csnd::plugin<Array0p2<std::fmax>> 

( csound , "fmax" , 

"k[]", "k[]k[]", csnd :: thread :: ik ) ; 

csnd::plugin<Array0p2<std::fmin>> 

(csound , "fmin " , 

" i [] " , " i [] i [] " , csnd : : thread : : i ) ; 

csnd::plugin<Array0p2<std::fmin>> 

(csound , "fmin " , 

"k[]", "k[]k[]", csnd :: thread :: ik ) ; 

This is a good example of how we can apply 
modern a C+-1- idiom to create compact code 
for the generation of a family of related opcodes. 

6 Conclusions 

Perhaps one of the conclusions of this paper is 
that C++ is not such a terrible choice for the 
implementation of computer music instruments. 
While C is still the preeminent language for au¬ 
dio signal processing, the latest C++ standards 
have made that language somewhat more inter¬ 
esting, providing almost a blend of high-level 
scripting with a (hopefully) efficient implemen¬ 
tation. 


References 

ISO/IEC. 2011. ISO international standard 
ISO/IEC 4882:2011, programming language 
C++. 

ISO/IEC. 2014. ISO international standard 
ISO/IEC 14882:2014, programming language 
C++. 

ISO/IEC. 2017. Working draft, standard for 
programming language C++. 

V. Lazzarini, J. ffitch, S. Yi, J. Heintz, 0. 
Brandtsegg, and I. McCurdy. 2016. Csound: 
A Sound and Music Computing System. 
Springer Verlag. 

V. Lazzarini. 2000. The SndObj sound object 
library. Organised Sound , (5):35-49. 

V. Lazzarini. 2013. The development of com¬ 
puter music programming systems. Journal 
of New Music Research , (42):97-110. 

V. Lazzarini. 2017a. Computer Music Instru¬ 
ments. Springer Verlag. 

V. Lazzarini. 2017b. The csound plugin op¬ 
code framework. In SMC 2011 (under re¬ 
view), Helsinki. 

V. Lazzarini. 2017c. The design of a 
lightweight dsp programming language. In 
SMC 201 7 (under review), Helsinki. 

Y. Orlarey, D. Fober, and S. Letz. 2004. Syn¬ 
tactical and semantical aspects of faust. Soft 
Computing, 8(9):623?632. 

T. Wishart. 1996. Audible Design. Orpheus 
The Pantomine. 



LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


141 


Meet the Cat: Pd-L20rk and its 
New Cross-Platform Version “Purr Data” 


Ivica Ico Bukvic 

Virginia Tech SOPA ICAT 
DISIS L20rk 

Blacksburg, VA, USA 24061 
ico@vt.edu 


Albert Graf 

Johannes Gutenberg 
University (JGU) 
IKM, Music-Informatics 
Mainz, Germany 
aggraef@gmail. com 


Jonathan Wilkes 

jon.w.wilkes@gmail.com 


Abstract 

The paper reports on the latest developments of Pd- 
L20rk, a fork of Pd-extended created by Ico Bukvic 
in 2010 for the Linux Laptop Orchestra (L20rk). 
Pd-L20rk offers many usability improvements and 
a growing set of objects designed to lower the learn¬ 
ing curve and facilitate rapid prototyping. Started 
in 2015 by Jonathan Wilkes, Purr Data is a cross¬ 
platform port of Pd-L20rk which has recently been 
released as Pd-L20rk version 2. It features a com¬ 
plete GUI rewrite and Mac/Windows support, lever¬ 
aging JavaScript and Node-Webkit as a replacement 
for Pd’s aging Tcl/Tk-based GUI component. 

Keywords 

Pd-L20rk, Purr Data, fork, usability, L20rk 

1 Introduction 

Pure Data, also known as Pd, [15] is arguably 
one of the most widespread audio and multime¬ 
dia dataflow programming languages. Pd’s his¬ 
tory is deeply intertwined with that of its com¬ 
mercial counterpart, Cycling 74’s Max [16]. A 
particular strength shared by the two platforms 
is in their modularized approach that empowers 
third party developers to extend the function¬ 
ality without having to deal with the under¬ 
lying engine. Perhaps the most profound im¬ 
pact of Pd is in its completely free and open 
source model that has enabled it to thrive in 
a number of environments inaccessible to its 
commercial counterpart. Examples include cus¬ 
tom in-house solutions for entertainment soft¬ 
ware (e.g. EaPd [10]), Unity3D [18] and smart¬ 
phone integration via libPD [1], an embeddable 
library (e.g. RjDj [11], PdDroidParty [12], and 
Mobmuplat [9]), and other embedded platforms, 
such as Raspberry Pi [4], 

Pd’s author Miller Puckette has spearheaded 
a steady development pace with the primary 
motivation being iterative improvement while 
preserving backwards compatibility. Puckette’s 
work on Pd continues to be instrumental in 


fostering creativity and curiosity across genera¬ 
tions, and as the library of works relying on Pd 
grows, so does the importance of conservation 
and ensuring that Pd continues to support even 
the oldest of patches. However, the inevitable 
side-effect of the increasingly conservationist fo¬ 
cus of the core Pd is that any new addition has 
to be carefully thought out in order to account 
for all the idiosyncrasies of past versions and en¬ 
sure there is a minimal chance of a regression. 
This vastly limits the development pace. 

As a result, the Pd community sought to com¬ 
plement Pure Data’s compelling core function¬ 
ality with a level of polish that would lower the 
initial learning curve and improve user experi¬ 
ence. In 2002 the community introduced the 
earliest builds of Pd-extended [13], the longest 
running Pd variant. There were other ambi¬ 
tious attempts, like pd-devel, Nova, and Desire- 
Data [14], and in recent years Pd has seen a 
resurgence in forks that aim to sidestep usabil¬ 
ity issues through alternative approaches, in¬ 
cluding embeddable solutions (e.g. libPd) and 
custom front ends. Pd-extended was probably 
the most popular alternative Pd version which 
continues to be used by many, even though it 
was abandoned in 2013 by its maintainer Hans- 
Christoph Steiner due to lack of contributors to 
the project. 

Pd-L20rk presents itself as a viable alterna¬ 
tive which started out as a fork of Pd-extended 
and continues to be actively maintained. We be¬ 
gin with a discussion of Pd-L20rk’s history, mo¬ 
tivation and implementation. We then look at 
Pd-L20rk’s most recent off-spring nick-named 
“Purr Data”, which has recently been released 
as Pd-L20rk version 2, runs on Linux, Mac 
and Windows, and offers some unique new fea¬ 
tures, most notably a completely new and im¬ 
proved GUI component. The paper concludes 
with some remarks ou availability and avenues 
for future developments. 
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2 History and Motivation 

Introduced in 2009 by Bukvic, Pd-L20rk [2] 
started as a Pd-extended 0.42.5 variant. The 
focus was on nimble development designed to 
cater to the specific needs of the Linux Lap¬ 
top Orchestra (L20rk), even if that meant sub- 
optimal initial implementations that would be 
ironed out over time as the understanding of the 
overall code base improved and the target pur¬ 
pose was better understood through practice. 

An important part of L20rk’s mission was 
educational outreach. Consequently, a major¬ 
ity of early additions to Pd-extended focused 
on usability improvements, including graphical 
user interface and editor functions. While some 
of these were incorporated upstream, a growing 
number of rejected patches began to build an 
increasing divide between the two code bases. 
As a result in 2010 Bukvic introduced a sepa¬ 
rately maintained Pd-extended variant, named 
Pd-L20rk after L20rk for which it was origi¬ 
nally designed. 

Over time, as the project grew in its scope 
and visibility, it attracted new users, and even¬ 
tually a team of co-developers, maintainers and 
contributors formed around it. This is obvi¬ 
ously important for the long-term viability of 
the project, so that it doesn’t fall victim to Pd- 
extended’s fate, and thus the development team 
continues to invite all kinds of contributions. 

Pd-L20rk’s philosophy grew out of its ini¬ 
tial goals and the early development efforts. It 
is defined by a nimble development process al¬ 
lowing both major and iterative code changes 
for the sake of improving usability and stabil¬ 
ity as quickly as possible. Another important 
aspect of this philosophy is releasing improve¬ 
ments early and often in order to have work¬ 
ing iterations in the hands of dozens of students 
of varying educational backgrounds and experi¬ 
ence, which ensured quick vetting of the ensuing 
solutions. 

Despite an ostensibly lax outlook on back¬ 
wards compatibility, to date Pd-L20rk and 
Purr Data remain compatible with Pd (the 
-legacy flag can be used to disable some of the 
more disruptive changes). In particular, there 
haven’t been any changes in the patch file for¬ 
mat, so patches created in Pd still work without 
any ado in Pd-L20rk and vice versa (assuming 
that they don’t use any externals which aren’t 
available in the target environment). Also, com¬ 
munication between GUI and engine still hap¬ 
pens through sockets, so that the two can run 


in separate processes (running the engine with 
real-time priorities). 

Like Pd-extended, Pd-L20rk provides a sin¬ 
gle turnkey monolithic solution with all the li¬ 
braries included in one package. This minimizes 
overhead in configuring the programming envi¬ 
ronment and installing supplemental libraries, 
and addresses the potential for binary incom¬ 
patibility with Pd. 

3 Implementation 

Pd-L20rk’s code base increasingly diverges 
from Pd. It consists of many bug-fixes, addi¬ 
tions and improvements, which can be split into 
engine, usability, documentation, new and im¬ 
proved objects and libraries, scaffolded learning 
and rapid prototyping. In this section we high¬ 
light some of the most important user-visible 
changes and additions, more details can be 
found in the authors’ PdCon paper [3]. 

3.1 Engine 

Internal engine contributions have largely fo¬ 
cused on implementing features and bug-fixes 
requested by past and existing Pd users. Some 
of these include patches that have never made it 
to the core Pd, such as the cord inspector (a.k.a. 
magic glass), improved data type handling logic, 
and support for outlier cases that may otherwise 
result in crashes and unexpected behavior. Ad¬ 
ditional checks were implemented for the Jack 
[ 6 ] audio backend to avoid hangs in case Jack 
freezes. Default sample rate settings are pro¬ 
vided for situations where Pd-L20rk may run 
headless (without GUI), thus removing the need 
for potentially unwieldy headless startup proce¬ 
dures. The $0 placeholder in messages now au¬ 
tomatically resolves to the patch instance, while 
the $@ argument can be used to pass the entire 
argument set inside a sub-patch or an abstrac¬ 
tion . 1 [trigger] 2 logic has been expanded to 
allow for static allocation of values, which alle¬ 
viates the need for creating bang triggers that 
are fed into a message with a static value. 

Visual improvements: The Tk-based [19] 
graphical engine has been replaced with TkPath 
[17] which offers an SVG-enabled antialiased 


1 In Pd parlance, an abstraction is a Pd patch encap¬ 
sulating some functionality to be used as a subpatch in 
other patches. 

2 Here and in the following we employ the usual con¬ 
vention to indicate Pd objects by enclosing them in 
brackets. 
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canvas. 3 A lot of effort went into streamlining 
“graph-on-parent” (Pd’s facility to draw GUI 
elements in a subpatch on its parent), includ¬ 
ing proper bounding box calculation and detec¬ 
tion, optimizing redraw, and resolving drawing 
issues with embedded graph-on-parent patches. 
Improvements also focused on sidestepping the 
limitations of the socket-based communication 
between the GUI and the engine, such as key¬ 
board autorepeat detection. As a result, the 
[key] object can be instantiated with an op¬ 
tional argument that enables autorepeat filter¬ 
ing, while retaining backward compatibility. 

Stacking order: Another substantial core 
engine overhaul pertains to consistent ordering 
of objects in the glist (a.k.a. canvas) stack. This 
has helped ensure that objects always honor the 
visual stacking order, even after undo and redo 
actions, and has paved the way towards more 
advanced functionality including advanced edit¬ 
ing techniques and a system-wide preset engine. 

Presets: The preset engine consists of two 
new objects [preset_hub] and [preset_node]. 
Nodes can be connected to various objects, in¬ 
cluding arrays, and can broadcast the current 
state to their designated hub for storing and re¬ 
trieval. Multiple hubs can be used with vary¬ 
ing contexts. The ensuing system is univer¬ 
sal, efficient, unaffected by editing actions, and 
abstraction- and instance-agnostic (e.g., using 
multiple instances of the same abstraction is 
automatically supported). It supports anything 
from recording individual states to real-time au¬ 
tomation of multiple parameters through peri¬ 
odic snapshots. 

Data structures: Data structures are an ad¬ 
vanced feature of Pd to produce visualizations 
of data collections such as interactive graphical 
scores. Pd-L20rk enhances these with the addi¬ 
tion of sprites and new ways to manipulate the 
data. 

3.2 Usability 

On the surface Pd-L20rk builds on Pd- 
extended’s appearance improvements. Under 
the hood, with the canvas being drawn as a 
collection of SVG shapes, the entire ecosys¬ 
tem lends itself to a number of new opportuni¬ 
ties. The most obvious involve antialiased dis¬ 
play, advanced shapes (e.g. Bezier curves that 
are also used for drawing patch cords), support 

3 SVG = Scalable Vector Graphics, a widely used vec¬ 
tor image format standardized by the W3C. 



Figure 1: Pd-L20rk running on Linux. 


for image formats with alpha channel, and ad¬ 
vanced data structure drawing and manipula¬ 
tion using SVG-centric enhancements (Fig. 1). 

A majority of usability improvements focus 
on the editor. The consistent stacking order im¬ 
plemented in the engine has served as a foun¬ 
dation for the infinite undo, as well as to-front 
and -back stacking options that are accessible 
via the right-click context menu. Lots of im¬ 
provements and polishing went into the iemgui 
objects , such as improved positioning, enhanced 
properties dialogs and graph-on-parent behav¬ 
ior. 

The old autotips patch was integrated (and 
improved upon). The tidy up feature has been 
redesigned to offer a two-step realignment of ob¬ 
jects. (The first key press aligns the objects on 
a single axis, while the second respaces them, so 
that they are equidistant from each other.) In¬ 
telligent patching was implemented to provide 
four variants of automatically generating mul¬ 
tiple patch cords based on user’s selection, and 
to provide additional ways of creating multiple 
connections (e.g. SHIFT + mouse click). The 
canvas scrolling logic has been overhauled to 
minimize the use of scrollbars, provide minimal 
visual footprint, and ensure most of the patch 
is always visible. 

Pd-L20rk supports drag and drop and has 
support for pasting Pd code snippets (using Pd’s 
“FUDI” format) directly onto the canvas. The 
copy and paste engine has been overhauled to 
improve buffer sharing across multiple applica¬ 
tion instances. The entire graphics engine is 
themeable and its settings are by default saved 
with the rest of the configuration files. 

3.3 Object Libraries 

Apart from the core Pd objects and improve¬ 
ments described in the Engine section above, 
Pd-L20rk offers a growing number of revamped 
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objects while also pruning redundant and un¬ 
necessary objects. 

Special attention was given to supporting 
the Raspberry Pi (RPi) platform with a cus¬ 
tom set of objects designed specifically to har¬ 
ness the full potential of the RPi GPIO and 
I2C interfaces, including [disis_gpio] and 
[disis.spi] [4], The cyclone library has re¬ 
ceived new documentation and a growing num¬ 
ber of bugfixes and improvements. Ggee li¬ 
brary’s [image] has received a significant over¬ 
haul and became the catchall solution for image 
manipulation. In addition to the standard Pd- 
extended libraries, Pd-L20rk has reintroduced 
[disis_munger~] and an upgraded version of 
the [fluid"] soundfont synth external which 
depend on the flext library. Other libraries in¬ 
clude fftease, lyonpotpourri, and RTcmix". Pd- 
L20rk bundles advanced networking externals 
[disismetsend] and [disis_receive], con¬ 
venience externals like [patch_name], and ab¬ 
stractions (e.g., those of the K12 learning mod¬ 
ule [5], and a growing number of L20rk-specific 
abstractions designed to foster rapid prototyp¬ 
ing). A few libraries have been removed due 
to lack of support and/or GUI object imple¬ 
mentations that utilize hardwired Tcl-specific 
workarounds. 

3.4 Introspection 

Most interpreted languages have mechanisms to 
do introspection. Pd-L20rk features a collec¬ 
tion of “info” classes for retrieving the state 
of the program on a number of levels, from 
the running Pd instance to individual objects 
within patches. Four classes provide the basic 
functionality: 

• [pdinfo] reflects the state of the running 
Pd instance, including dsp state, avail¬ 
able/connected audio and midi devices, 
platform, executable directory, etc. 

• [canvasinfo] is a symbolic receiver for the 
canvas, abstraction arguments, patch file¬ 
name, list of current objects, etc. The ob¬ 
ject takes a numeric argument to query the 
state of parent or ancestor canvases. 

• [classinfo] offers information about the 
currently loaded classes in the running in¬ 
stance. This includes creator argument 
types, as well as the various methods. 

• [objectinfo] returns bounding box, class 
type, and size for a particular object on the 
canvas. 


While the introspection provided by these 
classes is relatively rudimentary, it alleviates the 
need for a large number of external libraries that 
add missing core functionality. For example, 
Pd-L20rk ships with several compiled externals 
whose purpose is to fetch the list of abstrac¬ 
tion arguments. These externals all have dif¬ 
ferent interfaces and are spread across various 
libraries. Having one standard built-in interface 
for fetching arguments that behaves similarly to 
other introspection interfaces improves the us¬ 
ability of the system. Furthermore, opening up 
rudimentary introspection to the user increases 
the composability of Pd. Functionality that pre¬ 
viously only existed inside the C code can now 
be implemented as an abstraction (i.e., in Pd 
itself). These don’t require compilation and are 
more accessible to a wider number of users to 
test and improve them. 

4 Purr Data a.k.a. “The Cat” 

Despite all of the improvements it brings to 
the table, Pd-L20rk still employs the same old 
Tcl/Tk environment to implement its graphi¬ 
cal user interface. This is both good and bad. 
The major advantage is compatibility with the 
original Pd. On the other hand, Tcl/Tk looks 
and feels quite dated as a GUI toolkit in this 
day and age. Tel is a rather basic program¬ 
ming language and its libraries have been falling 
behind, making it hard to integrate the latest 
GUI, multimedia and web technologies. Last 
but not least, Pd-L20rk’s adoption was severely 
hampered by the fact that it relies on some 
lesser-used Tcl/Tk extensions (specifically, Tk- 
Path and the Tel Xapian bindings) which are 
not well-supported on current Mac and Win¬ 
dows systems, and thus would have required 
substantial porting effort to make Pd-L20rk 
work there. 

Purr Data was created in 2015 by Wilkes to 
address these problems. The basic idea was 
to replace the aging Tcl/Tk GUI engine with 
a modern, open-source, well-supported cross¬ 
platform framework supporting programmabil¬ 
ity and the required advanced 2D graphical ca¬ 
pabilities, without being tied into a particular 
GUI toolkit again. 

4 Readers may wonder about the nick-name of this 
Pd-L20rk offspring, to which the author in his origi¬ 
nal announcement at http://forum.pdpatchrepo.info 
only offered the explanation, “because cats.” Quite obvi¬ 
ously the name is a play on “Pure Data” on which “Purr 
Data” is ultimately based, but it also raises positive con¬ 
notations of soothing purring sounds. 
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Employing modern web technologies seemed 
an obvious choice to achieve those goals, as they 
are well-supported, cross-platform and toolkit- 
agnostic, programmable (via JavaScript), and 
offer an extensive programming library and 
built-in SVG support (as a substitute for Pd- 
L20rk’s use of TkPath which incidentally fol¬ 
lows the SVG graphics model). 

There are basically two main alternatives 
in this realm, nw.js 5 a.k.a. “node-webkit” and 
Electron 6 . These both essentially offer a stand¬ 
alone web browser engine combined with a 
JavaScript runtime, nw.js was chosen because it 
offers some technical advantages deemed impor¬ 
tant for Purr Data (in particular, an easier in¬ 
terface to create multi-window applications and 
better support for legacy Windows systems). 

So, in a nutshell, Purr Data is Pd-L20rk with 
the Tcl/Tk GUI part ripped out and replaced 
with nw.js. Purr Data’s GUI is written en¬ 
tirely in JavaScript, which is a much more ad¬ 
vanced programming language than Tel with an 
abundance of libraries and support materials. 
Patches are implemented as SVG documents 
which are generally much more responsive and 
offer better graphical capabilities than Tk win¬ 
dows. They can also be zoomed to 16 different 
levels and themed using CSS, improving usabil¬ 
ity. The contents of a patch window is drawn 
and manipulated using the HTML5 API. Thus 
the code to display Pd patches is very portable 
and will work in any modern GUI toolkit that 
has a webview widget. 

There are also some disadvantages with this 
approach. First, Tel code in Pd’s core and in 
the externals needs to be ported to JavaScript 
to make it work with the new GUI; we’ll touch 
on this in the following subsection. 

Second, the size of the binary packages 
is much larger than with Pd-L20rk or Pd- 
extended since, in order to make the packages 
self-contained, they also include the full nw.js 
binary distribution. This is a valid complaint 
about many of the so-called “portable desktop 
applications” being offered these days, but in 
the case of Purr Data it is mitigated by the fact 
that plain Pd-L20rk is not exactly a slim pack¬ 
age either. 

Third, the browser engine has a much higher 
memory footprint than Tcl/Tk which might be 
an issue on embedded platforms with very tight 
memory constraints. 

J https://nwjs.io/ 

6 https://electron.atom.io/ 


So far, none of these issues has turned out to 
be a major road-block in practice. The most 
serious issue we’re facing right now probably is 
that externals using Pd’s Tcl/Tk facilities need 
to have their GUI code rewritten to make it 
work with Purr Data; this is a substantial un¬ 
dertaking and thus hasn’t been done for all bun¬ 
dled externals yet. 

4.1 Implementation 

Using JavaScript in lieu of Tel as the GUI pro¬ 
gramming language poses some challenges. Tel 
commands with Tk window strings are hard¬ 
coded into the C source files of Pd. This means 
that any port to a different toolkit must ei¬ 
ther replace those commands with an abstract 
interface, or write middleware that turns the 
hard-coded Tel strings into abstract commands. 
Given the complexity of Tel commands in both 
the core and external libraries, that middleware 
would essentially have to re-implement a large 
part of the Tel interpreter. 

Consequently, Purr Data opted for the former 
approach of directly implementing an abstract 
interface. This takes the form of a JavaScript 
API providing the necessary GUI tie-ins to the 
engine and externals, which is called from the C 
side using a new set of functions (gui_vmess et 
al) which replace the corresponding functions 
of Pd’s C API (sys_vgui etc.). As already 
mentioned, this means that externals which use 
these facilities need to have their GUI code 
rewritten to make it work with the new GUI. 
(Affected externals will work, albeit without 
their GUI features.) 

Adding to the porting difficulty is the fact 
that Pd has no formal specification, and its GUI 
interface follows no common design pattern for 
2D graphics. For example, the graph-on-parent 
window appears at a glance as a viewport that 
clips to a specified bounding box. However, the 
bounding box itself behaves inconsistently-for 
built-in widgets like [hslider] or [bng] it clips 
(per widget, not per pixel), but for graphed ar¬ 
rays, data structure visualizations, and widget 
labels it does no clipping at all. 

To get to grips with these problems, Purr 
Data’s JavaScript GUI implementation draws 
and manipulates Pd patch windows using the 
HTML5 API, which is widely documented and 
used. The Pd canvas itself is implemented as an 
SVG document. SVG was chosen because it is a 
mature, widely-used 2D API. Also, larger can¬ 
vas sizes have little to no performance impact 
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on the responsiveness of the graphics. Since Pd 
patches can be large, this makes SVG a better 
choice for drawing a Pd canvas than the stan¬ 
dard HTML5 canvas. 

4.2 Leveraging HTML5 and SVG to 
Improve Pd Data Structures 

Purr Data employs a small subset of the SVG 
specification to implement quite substantial im¬ 
provements to data structure visualization. In¬ 
heriting from a pre-existing standards-based 2D 
API has several advantages over an ad-hoc ap¬ 
proach. First, if implemented consistently, the 
existing SVG documentation can be used to test 
and teach the system. Second, it is not neces¬ 
sary to immediately understand all the design 
choices of the entire specification in order to im¬ 
plement parts of it. Since those parts have been 
used and tested in a variety of mature applica¬ 
tions, it makes it easier to avoid mistakes that 
often riddle designs made by developers who 
aren’t graphics experts. Finally, there is less 
risk of a standards-based API becoming aban¬ 
doned than a more esoteric API. 

To improve data structure visualizations, sev¬ 
eral [draw] commands were added to support 
the basic shape/object types in SVG. The cur¬ 
rently supported types are circle, ellipse, 
rect, line, polyline, polygon, path, image, 
and g.' Each has a number of methods 
which map directly to SVG graphical attributes. 
Methods were also added for Document Object 
Model (DOM) events to trigger notifications to 
the outlet of each object. 

The screenshot in Fig. 2 shows the “SVG 
tiger” drawn from a few hundred paths found 
inside the [draw g] object. Even though 
the drawing is complex, Purr Data caches the 
bounding box for the tiger object to prevent 
the hit-testing from causing dropouts. One can 
mouse over the tiger and trigger real-time audio 
synthesis. 

It is also possible to set parameters for most 
of the [draw] attributes. For instance, the mes¬ 
sage opacity z can be sent to set a shape’s 
opacity to be whatever the value of the field 
z happens to be for a particular instance of 
the data structure. As soon as the value of 
z changes, Pd then automatically updates the 
opacity of the corresponding shape accordingly. 

7 The latter g element denotes a “group”, which is 
implemented as a special kind of subpatch that allows 
the attributes of several [draw] commands to be changed 
simult aneously. 



4.3 Custom GUI Elements 

As the SVG tiger example shows, Purr Data 
makes it possible to bind HTML5 DOM events 
to SVG shapes. Reporting the events is not en¬ 
abled by default, but can be switched on by sim¬ 
ply sending the appropriate Pd message to the 
[draw] object, such as the mouseover 1 mes¬ 
sage in Fig. 2. Each [draw] object has an out¬ 
let which then emits messages when events like 
mouse-over, movements and clicks are detected. 

It goes without saying that this considerably 
expands Pd’s capabilities to deal with user in¬ 
teractions, e.g., if the user wants to modify ele¬ 
ments of a graphical score in real-time. But it 
also paves the way for enabling users to design 
any kind of GUI element in plain Pd, without 
having to learn a “real” programming language 
and its frameworks. 

For instance, Fig. 3 shows a collection of three 
knobs drawn using the new SVG [draw] com¬ 
mands, whose values (represented by the r field 
in the nub data structure, which is linked to the 
rotation angles of the knobs) can be manipu¬ 
lated by dragging the mouse up or down. The 
values can then be read from the data struc¬ 
ture using Pd’s built-in [get] object and used 
for whatever purpose, just like with any of the 
built-in GUI elements. 

Pd offers a rather limited collection of built-in 
GUI elements to be used in patches, and extend¬ 
ing that collection needs a developer proficient 
in both C and Tcl/Tk. Purr Data’s new SVG 
visualizations totally change the game, because 
any Pd user can do them without specialized 
programming knowledge. We thus expect the 
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Figure 3: Custom GUI elements. 


facilities sketched out above to be used a lot 
by Pd users who want to enrich their patches 
with new kinds of GUI elements. As soon as it 
becomes possible to conveniently package such 
custom GUI elements as graph-on-parent ab¬ 
stractions, we hope to see the proliferation of 
GUI element libraries which can then be used by 
Pd users and modified for their own purposes. 

5 Getting Pd-L20rk 

The sources of Pd-L20rk and Purr Data are 
currently being maintained in two separate git 
repositories. 8 There are plans to merge the two 
repositories again at some point, so that both 
versions will become two branches in the same 
repository, but this has not happened yet. 

For Purr Data there is also Github mir¬ 
ror available at https://agraef.github.io/ 
purr-data/. This is mainly used as a one-stop 
shop to make it easy for users to get their hands 
on the latest source and the available releases, 
including pre-built packages for Linux, rnacOS 
and Windows. 

Because of Pd-L20rk’s addons and its com¬ 
prehensive set of bundled externals, the soft¬ 
ware has a lot of dependencies and a fairly com¬ 
plicated (and time-consuming) build process. 
So, while the software can be built straight from 
the source, it is usually much easier to use one 
of the available binary packages: 

• Virginia Tech’s official Pd-L20rk packages 
are available at http://12ork.music. 
vt.edu/main/make-your-own-12ork/ 
software/. 

8 cf. https://github.com/pd-12ork/pd and https: 
//git.purrdata.net/jwilkes/purr-data 


• Jonathan Wilkes’ Purr Data packages 
can be found at https://github.com/ 
agraef/purr-data/releases. 

• JGU also offers Pd-L20rk and Purr Data 
packages for Ubuntu and Arch Linux. 
Web links and installation instructions 
can be found at http://12orkubuntu. 
bitbucket.org/ and http://12orkaur. 
bitbucket. org/, respectively. 

The JGU packages can be installed alongside 
each other, so that you can run both “classic” 
Pd-L20rk and Purr Data on the same system. 
(This may be useful, e.g., if you plan to use Pd- 
L20rk’s K12 mode which has not been ported 
to Purr Data yet.) We mention in passing that 
JGU’s binary package repositories also contain 
Pd-L20rk and Purr Data versions of the Faust 
and Pure extensions which further enhance Pd’s 
programming capabilities. 9 

6 Future Work 

After Purr Data’s initial release as Pd-L20rk 
2.0 in February 2017, “classic” Pd-L20rk has 
become version 1.0 and went into maintenance 
mode. While development will continue on the 
Purr Data branch, we will keep the original Pd- 
L20rk available until all of Pd-L20rk’s features 
have been ported or have suitable replacements 
in Purr Data. 

Purr Data has matured a lot in the past few 
months, but like any project of substantial size 
and complexity it still has a few bugs and rough 
edges we want to address after the initial re¬ 
lease, in particular: 

• Port the remaining missing features from 
Pd-L20rk (autotips and K12 mode). 

• Port legacy Tel code that is still present in 
the GUI features of some of the 3rd party 
externals. 

• Some code reorganization is in order, along 
with a complete overhaul of the current 
build system. 

One interesting direction for future research 
is leveraging the new SVG visualizations as a 
means to create custom GUI elements in plain 
Pd, i.e., as ordinary Pd abstractions. This will 
make it much easier for users to create their 

9 Grame’s Faust and JGU’s Pure are two functional 
programming languages geared towards signal processing 
and multimedia applications [7,8]. 
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own GUI elements, and will hopefully encourage 
community contributions resulting in libraries 
of custom GUI objects ready to be used and 
modified by Purr Data users. 

With the expansion onto other platforms, Pd- 
L20rk’s key challenge is ensuring sustainable 
growth. As with any other open-source project 
of its size and scope, this can only be achieved 
through fostering greater community participa¬ 
tion in its development and maintenance, so 
please do not hesitate to contact us if you would 
like to help! 
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Abstract 

We present a set of tools to quickly implement modal 
physical models in the Faust programming lan¬ 
guage. Models can easily be generated from any 
impulse response or 3D graphical representation of 
a physical object. 

This system targets users with little knowledge 
in physical modeling and willing to use this type of 
synthesis technique in a musical context. 

Keywords 

Physical Modeling Synthesis, Faust Language, Dig¬ 
ital Signal Processing 

1 Introduction 

The Faust programming language has proven 
to be well suited to implement physical mod¬ 
els of musical instruments [Michon and Smith, 
2011] using waveguides [Smith, 2010] and modal 
synthesis [Adrien, 1991]. 

In this short paper, we present two Python 
scripts 1 allowing to easily generate Faust 
modal physical models: ir2dsp.py and 

mesh2dsp.py. 

• ir2dsp.py takes an audio file containing 
an impulse response as its main argument 
and converts it into a Faust file imple¬ 
menting the corresponding modal physical 
model. 

• mesh2dsp.py outputs the same type of 
model but takes an .stl 2 file containing 
the specification of any 3D object designed 
with a CAD 3 program as its main argu¬ 
ment. 

FAUST programs generated by ir2dsp .py and 
mesh2dsp.py are ready to use and can be 
compiled to any of the Faust targets (stan¬ 
dalone applications, plug-ins, etc.). 

1 https://github.com/rmichon/pmFaust/ 

All URLs in this paper were verified on 07/04/2017. 

2 STereoLithography. 

3 Computer-Aided Design. 


After briefly describing these two tools, we’ll 
evaluate them and provide directions for future 
works. 

2 Faust Modal Physical Model 

Any linear percussion instrument can be imple¬ 
mented using a bank of resonant bandpass fil¬ 
ters [Smith, 2010]. Each filter implements one 
mode (a sine or cosine function) of the system 
and can be configured by providing three pa¬ 
rameters: the frequency of the mode, its gain, 
and its resonance duration ( T60 ). 

Such a filter can be easily implemented in 
Faust using a biquad filter (tf 2) and by com¬ 
puting its poles and zeros for a given frequency 
(f) and T60 (t60): 

modeFilter(f,t60) = tf2(bO,bl,b2,al,a2) 
with { 


b0 

= l; 

bl 

= 0; 

b2 

= -l; 

w = 

2*PI 


r = pow(0.001, 1/float(t60*SR)); 
al = -2*r*cos(w); 
a2 = r~2; 

) ; 

mode(f,t60,gain) = 

modeFilter(f,t60)*gain; 

The modeFilter function can be easily ap¬ 
plied in parallel in FAUST using the par opera¬ 
tor to implement any modal physical model: 

model = 

_ <: 

par(i,nModes, 

mode(freq(i),t60(i),gain (i))) 

:> 

The Faust- generated block diagram corre¬ 
sponding to this code, with nModes = 4, freq ( i 
)= 100* (i+i) , and (t60 (i),gain(i) ) as succes¬ 
sively (0,9,0.9), (0.8,0.9), (0.6,0.5) and (0.5,0.6), 
can be visualized in Figure 1. 

This type of model can be easily excited by a 
filtered noise impulse (see Figure 2). The cut¬ 
off frequency of the lowpass and highpass filters 
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Figure 1: Block diagram of a Faust modal 
physical model. 

can be used to excite specific zones of the spec¬ 
trum of the model and to choose the “excitation 
position.” Since this system is linear, the same 
behavior could be achieved by scaling the gain of 
the different modes, but the filter approach that 
we use here will better integrate to our modu¬ 
lar physical modeling synthesis toolkit, briefly 
presented in §6. 

White Noise ^[Towpass | -> j Highpass | -»- (Tnvelope | —>- To Model 

Figure 2: Excitation generator algorithm used 
to drive our modal physical models. 

3 ir2dsp.py 

±r2dsp.py takes an audio file containing an 
impulse response as its main argument. After 
performing the Fast Fourier Transform (FFT) 
on it, modes information is extracted by carry¬ 
ing out peaks detection. The TOO of each mode 
is computed by measuring its bandwidth at -3 
dB. 

Modes information is formatted by 
±r2dsp.py to be plugged to a generic 
modal Faust physical model similar to the one 
described in §2. The output of the Python pro¬ 
gram is a ready-to-use Faust file implementing 
the model. 

The goal of this tool is not to create very ac¬ 
curate models but rather to be able to strike any 
object (e.g., a glass, a metal bar, etc.), record 
the resulting sound, and turn it into a playable 
digital musical instrument. 

4 mesh2dsp.py 

The output of mesh2dsp.py is the same as 
ir2dsp .py (see §3), but it takes a . stl hie as 


its input instead of an impulse response, stl is 
a common format supported by most CAD pro¬ 
grams to export the description of 3D objects. 

After converting the provided .stl hie into 
a mesh, mesh2dsp.py performs a Finite Ele¬ 
ment Analysis (FEA) using Elmer 1 Various pa¬ 
rameters such as the Young Modulus, the Pois¬ 
son Coefficient, and the density of the material 
of the object must be provided to carry out this 
task. 

The result of the analysis is a set of eigen¬ 
values and mass participations for each mode. 
Eigenvalues are then converted to mode fre¬ 
quencies and mass participations to mode gains. 
Unfortunately, this technique doesn’t allow to 
calculate the TOO of the modes which can be 
configured by the user directly from the Faust 
program. 

5 Evaluation 

To evaluate the accuracy of ir2dsp.py, we 
recorded the impulse response of a can and gen¬ 
erated its corresponding modal physical model. 
Figure 3 shows the spectrogram of the impulse 
response of the can and Figure 4 the spectro¬ 
gram of the impulse response of the physical 
model generated by ir2dsp.py. ir2dsp.py 
was configured to detect peaks at a minimum 
value of -20 dB and at least 100 Hz spaced 
from each other. We see that the synthesized 
sound is pretty close to the recorded version. 
T60 s are not perfectly accurate since they were 
calculated by measuring the bandwidth of the 
mode. Tracking their evolution in the time do¬ 
main would provide better results; thus we plan 
to use this technique in the future instead. 



Time (s) 


Figure 3: Spectrogram of an impulse response 
of a can 


4 https://www.esc.fi/web/elmer/ 
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Figure 4: Spectrogram of the output of the 
modal model generated with ir2dsp.py from 
a can IR 

mesh2dsp . py was tested with the geometric 
3D model of a solid bar and provided good sub¬ 
jective auditory results. More objective analysis 
is clearly needed here. 

6 Future Work 
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This work has been carried out as part of a 
larger project on designing a physical modeling 
toolkit for the Faust programming language. 
ir2dsp.py and mesh2dsp.py will be inte¬ 
grated to it. 

We plan to improve ir2dsp.py by using a 
better T60 measurement algorithm. Indeed, 
the T60 of each mode is currently computed 
by measuring its bandwidth after taking the 
FFT of the entire impulse response. A better 
approach would be to extract this information 
from a time-frequency representation of the sig¬ 
nal (i.e., spectrogram), which would be more 
accurate. 

Finally, we would like to try other open- 
source packages than Elmer to carry out the 
FEA in mesh2dsp . py to get better results and 
to smooth its integration in our Faust physical 
modeling toolkit. 


7 Conclusion 

We presented a series of prototype tools allow¬ 
ing to design at a very high level ready-to-use 
physical models of musical instruments. Models 
can be generated from impulse responses or 3D 
graphical representations of physical objects. 

While the models generated by this system 
are far from being accurate, we believe that it 
provides a convenient way for composers and 
musicians to design expressive custom instru¬ 
ments usable in a musical context. 
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Abstract 

This paper aims to demonstrate the use of funda¬ 
mental frequency estimation as a gateway to create 
a non-interactive system where computers communi¬ 
cate with each other using musical notes. Frequency 
estimation is carried out using the YIN algorithm, 
implemented using the ofxAubio add-on in open- 
Frameworks. Since we have used only open-source 
technologies for the implementation of this project, 
it can be executed on any platform: Linux, Macin¬ 
tosh OS or Windows OS. As a simulation, this sys¬ 
tem is used to depict a conversation between people 
at a round-table event and presented as an audiovi¬ 
sual art installation. 

Keywords 

Pitch Estimation, MIR, YIN algorithm, Aubio, 
Audio Simulation. 

1 Introduction 

Pitch detection algorithms have been used in 
various contexts in the past: 

• audio editing programs (pitch correction 
and time scaling) such as Melodyne 1 , 

• analysis of complicated melodies of world 
music cultures (Indian Classical music), 

• music notation programs like Sibelius 2 , 

• MIDI interfaces such as the Roland GI-20 
to get data from guitar MIDI pickups. 

Since it lies firmly within the domain of music 
information retrieval (MIR), pitch estimation 
has many applications in recommender systems, 
sound source separation, genre categorization 
and even music generation. With the popular¬ 
ity of machine learning, neural-networks, and 
data mining, audio signal processing tools are 
utilized more and more to create user-specific 

’http: //www. celemony. com/en/melodyne/ 
what-is-melodyne. URLs in this paper were veri¬ 
fied on Feb. 16, 2017. 

2 http://www.avid.com/sibelius 


systems. When considering applications for 
pitch detection and source separation, we often 
consider the example of identifying individual 
speakers at a round-table event. Computers, 
unlike humans, have a difficult time identifying 
the words of a particular person if multiple peo¬ 
ple are communicating with each other simulta¬ 
neously. Building on that concept, we simulate 
a conversation between multiple computers us¬ 
ing musical notes. The YIN pitch detection al¬ 
gorithm is employed to detect trigger notes, to 
which other computers in the network respond. 

Since we desire a continuous conversation, we 
must use an efficient detection algorithm with 
the following features: 

• real time response, 

• minimal latency, 

• accurate identification in the presence of 
noise. 

We need to be careful about both the latency 
of the attack and of the pitch detection algo¬ 
rithm since if a note is played, the human ear 
needs at least seven periods of a waveform to 
identify its pitch. Hence, note onsets and note 
pitches are not directly related [3]. Additionally, 
we must ensure that the pitch recognition algo¬ 
rithm is reasonably robust to the sort of noise 
which is inevitable in a performance scenario. 

After comparing different methods for pitch 
estimation, as in [3], we chose the YIN algo¬ 
rithm for its real-time tracking ability. YIN is 
a time-domain algorithm based on the autocor¬ 
relation method for estimation. [2], Using the 
common autocorrelation method, its error rates 
are analyzed and corrected for every new itera¬ 
tion to ensure the best possible accuracy. Using 
YIN is beneficial since it can accurately analyze 
higher frequencies which we might use as trigger 
notes in our system. To make sure that we have 
the lowest possible pitch identification latency 
and have a very small frame size for incoming 
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audio, we use the YIN algorithm implemented 
in the aubio framework [1], extended further as 
an addon in openFrameworks 3 for our computer 
network. 

2 Methods and Implementation 

Since fundamental frequency identification 
works well on monophonic sounds (e.g., a guitar 
solo or any wind instrument), it was decided to 
use this approach to reduce the problem of in¬ 
correct triggers for the other computers in the 
network. Individual notes are generated one af¬ 
ter the other at randomized tempos (to mimic 
the prosody of human speech) in a particular 
musical scale. Every other computer has its own 
“voice” which does not overlap with that of any 
other computer in the network. At a given point 
in time, multiple “people” can “speak” simulta¬ 
neously. 

The YIN algorithm is accurate enough to suf¬ 
ficiently identify the trigger notes at any given 
time within the chaotic yet pleasing tone of the 
conversation, and the computers react accord¬ 
ingly. Visual feedback is provided to portray 
whether a computer is voicing itself or is re¬ 
maining silent in response to the trigger note. 
For the test system, three computers, each with 
their own voice, constituted the network. 



Figure 1: Overview of the system 


2.1 Audio 

Each voice for the “speaker” in the system is 
a musical scale. For our demonstration, each 

3 See https://github.com/aubio/ofxAubio and 
http://openframeworks.cc/ 


computer was set to the G, C and D Major 
scale respectively. The choice for these par¬ 
ticular scales was arbitrary and baseless. The 
ofxStk 4 add-on was used to generate the sounds 
using the Moog synthesis class. The generated 
sound has a reverb effect applied to it in order to 
create a wider stereo image when all computers 
are in place. 

In order to mimic the characteristics of hu¬ 
man speech, there is no fixed tempo for the sys¬ 
tem. When a particular trigger note is heard by 
a computer, it goes silent and shifts its scale by 
an octave higher or lower and also changes its 
corresponding trigger note in order to keep the 
whole system ambiguous. The ambiguity lies in 
the fact that when all computers are communi¬ 
cating simultaneously, it is difficult to identify 
what the actual trigger note is and maintains 
the illusion of an improvised conversation. The 
exact flow of the network is represented in Fig¬ 
ure 1. This flow remains constant for any new 
computer added to the system. 

2.2 Graphics 

Graphical feedback is used to convey whether a 
particular computer is active or silenced. We 
used the ofxParticles add-on to implement 
particle physics in order to visually represent 
the current state of our system. 



Figure 2: Screenshot of the particle physics 
graphics 


White particles at any given instant repre¬ 
sent an “in-active state” of the computer, and 
colored particles are used when the computer is 
active (as shown in Figure 2). The color of the 
particles for each computer changes when it be¬ 
comes active after being inactive for a random 
amount of time, as if joining the conversation 

4 https://github.com/Ahbee/ofxStk, which encap¬ 
sulates the original Synthesis ToolKit, https://ccrma. 
Stanford.edu/software/stk/ 









LAC2017 - CIEREC - GRAME - Universite Jean Monnet - Saint-Etienne - France 


157 


again to state its opinion. Particles are emitted 
from the exact center, spiral outward, eventu¬ 
ally fade out. There is a pseudo-gravitation ef¬ 
fect that makes the particles orbit around the 
center of the screen after their spiral trajec¬ 
tory. We designed this visual tool to create a 
psychedelic effect for the audience. 

2.3 Evaluation 

When presenting this system to a group of ob¬ 
servers it was noticed that despite having an 
underlying pattern to the changes in scale and 
trigger conditions, they could not detect this 
and the audience expressed that they believed 
the computers were having a conversation, al¬ 
beit through musical notes. The audience also 
responded that having a visual feedback gave 
each computer a unique personality. 

3 Future Work 

The future scope of this project involves the in¬ 
tegration of machine learning in order to im¬ 
plement musical improvisations as a response 
to the trigger conditions. This would make the 
whole system more expressive (i.e., trivial ac¬ 
tions such as having the computer go silent or 
be active, etc.). Polyphonic sound identifica¬ 
tion is also a viable addition to have a more 
immersed experience of musicians improvising 
with one another. 

4 Conclusion 

In this paper we presented pitch estimation as 
a tool for a musical performance system. Since 
the future scope does involve the use of ma¬ 
chine learning and data mining techniques, as 
is the custom with music information retrieval, 
this was a relevant stepping stone. Presently, 
this system is being modified to include multiple 
triggers for the whole system. We are trying to 
move away from the initial condition of having 
only one computer respond to a single trigger 
note but have multiple computers react to more 
than one trigger added into the network. 
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Porting WDL-OL to LADSPA/LV2 
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Abstract 


2 The Geek Meeting 


WDL-OL is an open source framework that is used 
to develop audio plug-ins in various formats (VST, 
VST3, AU, RTAS, AAX) for Mac and Windows 
operating systems. The proposition is to add the 
possibility to develop plug-ins in LADSPA and 
LV2 formats under Linux. 

Keywords 

Audio Plug-ins, WDL-OL, LADSPA, LV2 

1 Introduction 

WDL / IPlug is a simple-to-use C++ 
framework for developing cross platform audio 
plugins and targeting multiple plugin APIs with the 
same code. Originally developed by Schwa/Cockos, 
IPlug has been enhanced by various contributors, in 
particular Oliver Larkin, whose version seems to be 
the most used. 

Plug depends on WDL, and that is why this 
project is called WDL-OL, although most of the 
differences from Cockos' WDL are to do with IPlug. 
The source code for the framework can be 
downloaded from the WDL repository [1]. 

There exists also a very active discussion list 
about WDL [2] 
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The author has developed a few freeware 
plug-ins (pictured here: Clockwise Orange which 
manages multi tap multi delays, and Strawberry 
Feel which provides a graphical audio language) 
using WDL-OL, and he wishes to make them 
available to the Linux community. Some other 
plug-ins authors may whish to do the same thing, 
and therefore add to the plug-ins offer under 
Linux. For this, we need to add to WDL-OL the 
LADSPA/LV2 support (for DAWs, but also for 
tools like ChucK), and to find people 
knowlegeable on the subject and willing to help us 
in this development. 

References 

[1] WDL Git Repository: 

https://github.com/olilarkin/wdl-ol 

[2] WDL discussion Forum: 

http://forums.cockos.com/forumdisplav.php? 
f=32 
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Workshops 


1 -Yoshimi and the reluctant developer by Will Godfrey (United Kingdom) 

A workshop overview of my involvement with the Yoshimi soft-synth, discussion and 
current status, including demonstrations 


See https://sourceforge.net/projects/yoshimi/ 
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2. Free Software and DIY at Radio Panik by Frederic Peters, Arthur Lacomme et 
Suzie Suptille (Belgium) 


Radio Panik is a community FM-radio created in Brussels in 1983, it has been using, 
adapting and creating free software for much of its activity for years. This workshop 
aims to explore and discuss our current audio practices, from broadcast systems to 
creative tools. 


See http://www.radiopanik.org/ 

3. Building a local linux audio community by Daniel Appelt (Germany) 

The Open Source Audio Meeting Cologne is a monthly gathering of audio and free 
software enthusiasts. This workshop provides insights how such a regular event may be 
organized. 

See http://cologne.linuxaudio. org 

4. Generative Music with Recurrent Neural Networks par Kosmas Giannoutakis 

(Institutfiir Elektronische Musik und Akustik Inffeldgasse, Graz, Austria) 

In this workshop the generative music capabilities of artificial recurrent neural 
networks will be explored, using an abstractions library for the programming 
environment Pure Data, called RNMN (Recurrent Neural Music Networks). The library 
provides the basic building blocks, neurons and synapses, which can be arbitrarily 
connected, easily and conveniently, creating compound topologies. The framework 
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allows real time signal processing, which permit direct interactions with the topologies 
and quick development of musical intuition. For the workshop, a laptop with a build-in 
microphone, Pure Data (PD-vanilla 0.47.1 version is recommended) installed and 
headphones is required for the participants. Experience with visual programming 
is a plus but not a necessary prerequisite. It will be explained the basic principles of the 
framework and it will be demonstrated the construction of some basic topologies. In the 
end the participants can create their own topologies which can demonstrate to the other 
participants. 



5. Origin, features and roadmap of the MOD* Duo by Mauricio Dwek, Gianfranco 
Ceccolini & Filipe Coelho (Germany) 

(* Musical Operating Devices For Experienced Musicians) 

In this workshop, MOD Devices tells its story and shows its heavy use of Linux Audio 
technologies for the MOD Duo. 

The workshop will consist of: 

• History on how the MOD Duo came to be 

• What (Linux Audio) technologies are used inside 

• Challenges and difficulties found while making the Duo 

• Showing the Duo in action 

• Things to come soon 

The audience will be encouraged to get their hands on the device and try out its features. 
See https://moddevices.com/ 
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6. Too Much Qstuff* To Handle by Rui Nuno Capela (Portugal) 

Following in the tradition of LAC2013@IEM-Graz, LAC2014@ZKM-Karlsruhe, 
LAC2015@JGU-Mainz and miniLAC2016@c_base-Berlin, this talk/workshop is once 
again being proposed as an informal opportunity for open debate and discussion, over 
the so called Qstuff* software constellation. Although starring Qtractor [4] as the main 
subject, all users and developers are welcome to attend, whether or not they're using 
any of the Qstuff*. An all-inclusive talk/workshop. 

The Qstuff* are, in order of appearance: 

[1] QjackCtl - A JACK Audio Connection Kit Qt GUI Interface 
http://qjackctl.sourceforge.net 

https: / / github.com/rncbc/qj ackctl 

[2] Qsynth - A fluidsynth Qt GUI Interface 
http://qsynth.sourceforge.net 
https://github.com/rncbc/qsynth 

[3] Qsampler - A LinuxSampler Qt GUI Interface 
http://qsampler.sourceforge.net 
https://github.com/rncbc/qsampler 
https://github.com/rncbc/liblscp 

[4] Qtractor - An audio/MIDI multi-track sequencer 
http://qtractor.org 

http://qtractor.sourceforge.net 
https: / / github.com/rncbc/qtractor 

[5] QXGEdit - A Qt XG Editor 
http://qxgedit.sourceforge.net 
https: / / github.com/rncbc/qxgedit 

[6] QmidiNet - A MIDI Network Gateway via UDP/IP Multicast 
http://qmidinet.sourceforge.net 
https://github.com/rncbc/qmidinet 

[7] QmidiCtl - A MIDI Remote Controller via UDP/IP Multicast 
http://qmidictl.sourceforge.net 
https://github.com/rncbc/qmidictl 

[8] synthvl - an old-school polyphonic synthesizer 
http://synthvl.sourceforge.net 
https://github.com/rncbc/synthvl 

[9] samplvl - an old-school polyphonic sampler 
http://samplvl.sourceforge.net 
https://github.com/rncbc/samplvl 

[10] drumkvl - an old-school drum-kit sampler 
http://drumkvl.sourceforge.net 

https: / / github.com/rncbc/drumkvl 
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7. Moony - rapid prototyping ofLV2 (MIDI) event filters in Lua by Hanspeter 
Portner (Switzerland) 

In need of a specific event filter no yet existent for your DAW or live setup? No time or 
skill to write your own MIDI plugin in C/C++? Moony comes to the rescue and lets you 
script your filters in Lua on-the-fly for any LV2 host. Come and learn about the LV2 atom 
event system and rapid prototyping in Lua. 

See https://open-music-kontrollers.ch/ 

8. Interactive music with i-score by Jean-Michael Celerier 

(Laboratoire Bordelais de Recherche en Informatique, France) 

This workshop presents the i-score (www.i-score.org) sequencer. 

It will present the challenges and the rationale that led to the creation of the software, 
that is, providing a dedicated tool for temporal design in an interactive context. 

The construction of a score will be detailed on practical examples involving audio-visual 
features and interaction with a familiar creative coding environment such as PureData, 
openFrameworks or Processing. 

See http://www.i-score.org 
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9. Smartphone Passive Augmentation by John Granzow* & Romain Michon** 

(* University of Michigan - United States, ** Stanford University - United States) 

In this 4 hours workshop (2 sessions) we will present Mobile3D, a library for 
introducing passive musical augmentations to mobile phones. The library allows 
participants to quickly leverage the parametric features of OpenScad a functional 
programming language for text based computer assisted drawing (CAD). Several 3D 
printers will be on hand to materialize designs. The workshop gives participants the 
tools to customize their smartphones for musical interaction. 

See https://ccrma.stanford.edu/~rmichon/mobile3D/ 
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Concerts 

Thursday, May 18 - 06:30pm - 08:00pm - Auditorium de la Maison de 
I'Universite - 10 rue Trefilerie - Saint-Etienne 

Concert N°l: Mixed Music 

It is a first concert featuring works of mixed music (music with acoustic instruments and 
electronic devices that interacts). These works are pieces whose electronic devices have 
been developed with real-time free software (FAUST and Supercollider). They are 
played by regional musicians and students from Saint-Etienne. 

1. SmartMachine musicale (20' - 2017 - Creation) 

With the students of La Salle secondary school (Saint-Etienne), Robert Chauchat, music 
teacher, Romeo Monteiro, percussionist, solist from Ensemble Orchestral Contemporain 
and Gerard Authelain, teacher and musician (GRAME). 

Guided by their teacher and musicians, the students have explored the relationship 
between the technical object and the musical creation, but also the musical gesture 
(using the Smartfaust apps for smartphone designed by GRAME, National Center of 
Music Creation in Lyon). They have gradually created a new kind of sound scene, 
halfway between a performance and a visual installation. They invite you tonight to 
discover a space of research, questioning the aesthetic of the concert, the musician’s 
attitude and his instrument! 

Project conducted by a partnership between GRAME and Ensemble Orchestral 
Contemporain, with the support from DRAC Auvergne - Rhones-Alpes and the 
Department of Loire. 

2. Smartbones (20’ - 2016) 

With brass section students of the Conservatory of Valence (Leane Berthaud, Alice 
Chakroun, Noam Leenhardt, Rami Leenhardt, Meryem Ouannas, Antonin Vinay) and 
Pierry Bassery, musician and teacher. 

In order to arouse the curiosity and the listening, the students of the conservatory of 
Valence, guided by the trombonist Pierre Bassery, combine repertoire, improvisation, 
creation and new technologies with pieces for trombone/tuba. The incorporation of 
smartphones on the instruments allows the musicians to play with Smartfaust apps as 
the continuation of the instrumental gesture. Thus these new instruments 
("Smartbones"- smartphones & trombones) will generate a sound material that will lead 
the students to imagine choreographies in which dance, trombone and electro will be 
mixed and confronted. 

3. Peyote (16' - 2016) by Sebastien Clara, mixed music for trio & electronics 

With Alice Szymanski, flute, Justine Eckhaut, piano and Florent Coutanson, saxophones. 

In 1936 Antonin Artaud left Europe to travel to Mexico. This departure symbolizes his 
break with surrealist aesthetics. "The rationalist culture of Europe has gone bankrupt 
and I have come to the land of Mexico to seek the bases of a magical culture that can still 
spring from the forces of Indian soil." However, by a romantic desire to touch the bottom 
or a personal inner experience to better understand his fellows, Artaud undertakes "a 
descent to come out of the day", a journey within his journey, a transcendental abysm. 
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Artaud sets out to research "the ancient solar culture". To do this, he wants to be 
introduced to the culture of the men and women of the Sierra Tarahumara. 

Peyote retraces the dances of the Tarahumara to the glory of the sun, which undeniably 
influenced the work of Antonin Artaud. 


Friday, May 19 - 08:00pm - 12:00pm - Auditorium de la Maison de 
TUniversite - 10 rue Trefilerie - Saint-Etienne 

Concert N°2: Electronic Music 

It is a concert of electronic music with musicians coming from the USA or from various 
European countries. They will play works, mostly experimental, realized with diverse 
and innovative digital devices, all involving Open Source software. 

1. Inaudible Harp (10') by Bruno Ruviaro & Juan-Pablo Caceres 

(Santa Clara University - USA) 

Music for harp, distant, played via Internet, and a computer system for real time sound 
synthesis and processing. 

One of "origin tales" of Ambient Music has Brian Eno stuck in a hospital bed after an 
accident: lying immobile in bed, he would listen to records played by visiting friends. 
One day it was harp music, with the volume turned so low that the plucked strings were 
almost inaudible. "At first I thought, 'Oh God, I wish I could turn it up," Eno remembers. 
"But then I started to think how beautiful it was. It was raining heavily outside and I 
could just hear the loudest notes of the harp coming above the level of the rain." Our 
telematic duo improvisation is inspired by this image. 

The duo utilizes Supercollider to generate and process sounds, and JackTrip to stream 
audio over the internet. JackTrip is a Linux and Mac OS X-based system used for multi¬ 
machine network performance over the Internet. It supports any number of channels (as 
many as the computer/network can handle) of bidirectional, high quality, uncompressed 
audio signal streaming. The duo usually performs with one player on location, and the 
other remotely from Chile or the USA. 

2. Vox Voxel (12’ - 2015) by Fernando Lopez-Lezcano* & John Granzow** 

(* Center for Computer Research in Music and Acoustics - Stanford University - USA, ** 
University of Michigan - USA) 

Music for two interpreters, a 3D printer, a Daxophone, a Korg Nanokontrol 2 and a 
computer. 

From an IBM 720 line printer playing Three Blind Mice in 1954 to dot matrix printers 
playing love songs and Queen, mechanical noises coming from printers were slowly 
tamed, domesticated and controlled, and countless unproductive hours of programming 
time were spent in figuring out how to make those noises into musical notes, phrases 
and whole pieces for the enjoyment of the IT team. From deafening antique mainframe 
line printers to whisper quiet inkjets, all have been at the spotlight of a concert 
performance (or at least a basement computer room). 

VoxVoxel is "composed" by designing a suitably useless 3D shape and capturing the 
sound of the working 3D printer using piezoelectric sensors. Those sounds are 
amplified, modified and multiplied through live processing in a computer using Ardour 
and LV2/LADSPA plugins, and output in full matching 3D sound. 3D pixels in space. 

The piece is dedicated to our endangered wooden 3d printer, slowly declining with the 
rise of folded metal frames in entry-level machines. The wood, (if fragile) is good for 
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contact one view of an object and score for the piece vibrations, to amplify rhythms of 
the tool-path and the frequencies of stepper motors. This rare 3d printer takes six 
minutes to warm up its extruder. For this, it has also fabricated an array of extensions 
for its equally endangered human performer. 



Sound Symposium 2016 performance 


3. Pointillism (20' - 2016) by Iohannes Zmolnig 

(Institute of Electronic Music and Acoustics - University of Music and Performing Arts - 
Graz - Austria) 

Pointillism is a solo live-coding performance in Pd. Both the code representation and the 
generated audio use "points" and "dots" as building blocks: the music is generated using 
morse code patterns, the code is written in "Braille” (the dot-based writing system 
especially designed for blind/visually-impaired people). 

The performance is an ironic statement on the popular "show-us-your-screens” 
paradigm, by presenting the code in a form that does not even pretend to be readable. 


•ss * 


•• •• • •• 
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4. Seven Sphere Journey (12' - 2016) by Broch Vilbjorg 

(Anti-Delusion Mechanism - Netherlands) 

Music for computer generated sound, voice, dance and video projection 

This project explores generative sound, graphics and algorithmic composition based on 
the octonion algebra. The octonions form an 8 dimensional normed division algebra. The 
project started 2016 and and is in ongoing development. The octonions are an extremely 
rich subject within mathematics, with symmetry relations to Lie algebras and not the 
least to the so called special Lie groups. 

The unit-length octonions trace the seven-sphere: S7 - a 7 dimensional surface in 8 
dimensional space. The project explores orbits in 8 dimensional space for audio 
synthesis and graphics. 

5. 1J3V1532 (25' - 2012) by David Runge 

(Elektronic Studio, TU Berlin & c-base - Germany) 

1)3\/1532 (deviser) is drone/noise/experimental soundscaping. 

The aural journey leads to places like simple repetitve guitar tunes, loops, feedback 
manipulation, modified samples of field recordings, DIY analog synthesizers, toys and 
the like. 

Experimentation for the greater good! 

6. 5-HT_five levels to zero (28' - 2016) by Tina Mariane Krogh Madsen & Malte 
Steiner 

(Block4 - Germany) 

For Linux computer with Pure Data, synthesizer, div pedals and instruments for noise 

The concert 5-HT_five levels to zero is based on the structural qualities of the 
neurotransmitter serotonin. The dynamics of the molecule will be improvised and 
performed live in a dynamic and counterbalanced noise act that deals with both the 
balances as well as the imbalances inherent, resulting in chaotic states caused by 
disruption of this unit in the brain. For the piece, TMS has created a score that captures 
the dynamics of the musical composition, where the audio will be accompanied by a 
projection of a visual score created in Blender and Pure Data. 



5-HT_five levels to zero concert, Piksel Festival, Bergen (NO), November 2016. Photo: Piksel Festival 5-HT_five levels to zero concert, Piksel Festival, Bergen (NO), November 2016. Photo: Piksel Festival 


7. Level 5 Alert (30' - 2016) by Frederic Peters, Arthur Lacomme and Suzie Suptille 

(Radio Panik - Belgium) 

Live performance; the show was created after 2016 Brussels bombings as an 
independent and uncontrolled mean of expression, as a recurrent collective work mixing 
experimental and creative writing, music and sound effects. 
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8. The Infinite Repeat (22') by Jeremy Jongepier 

(Iinuxaudio.org - USA & MOD Devices - Germany) 

For guitar, computer and voice. 

A musician with over 25 years of experience and a computer with Linux. That's what it 
boils down to. The result: conventional, solid song-writing, with an eclectic tinge 
because of the choice to not walk the threaded paths coupled with an auto-didactic 
background, an outspoken personal taste and an open-minded world-view. 

This year The Infinite Repeat will be all about going back to the roots and mixing that up 
with the latest Linux based technology. So expect some solid acoustic singer-songwriter 
material with a modern touch and a Linux device on the floor. 

See http://theinfiniterepeat.com 

Saturday, May 21 - 08:30pm - 10:00pm - L'Estancot - 10 Rue Henri 
Dunant - Saint-Etienne 

Concert N°3: Acousmatic Music 

1. Voce 3316 (3’ - stereo), Massimo Fragala, Italy. 

2. Kecapi III (10’56 - octophony), Patrick Hartono, Indonesia. 

3. Inuti (9’ -16 channels), Helene Hedsund, United Kingdom. 

4. Profon (7’ - stereo), Magnus Johansson, Sweden. 

5. Dark Path #6 (5’ - stereo), Anna Terzaroli, Italy. 

6. Kruchtkammer (6’ - quadriphony), Lukas Tobiassen, Germany. 

7. Definierte Lastbedingung (11’40 - octophony ambisonic), Clemens Von Reusner, 
Germany. 

8. Suite of miniatures (10’ - stereo), Bernard Bretonneau, France 

9. ConcretX (5’ - ambisoniX), Jean-Marc Duchenne, France 
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Multimedia 

Installations 
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Multimedia installations 

1. OUPPO (Harris Louise - United Kingdom) 

Ouppo is a generative audiovisual installation composed of four modular units for video 
mapping. 


2. PROFON (Julien Ottavi - France) 

Profon is a performative device composed of a collection of flower pots moved in time 
and space. 


3. SONIC CURRENT (Giannoutakis Kosmas - Austria) 

Sonic current is a site-specific sound installation which transform architectural locations 
into "sonic conscious” organisms. The transformation of the site into a body, with its 
sense organs (microphones) and actuators (loudspeakers), enable the site to articulate 
and manifest itself in an open dialogue with its visitors. 


4. THETA FANTOMES (Apo33 Collective - France) 

Theta Fantomes is a cross-disciplinary digital game/art project. Its an art piece 
developed by AP033 to realise some of our ideas about using real-time neuronal data 
processing with game play in a hybrid transcendental experience. 


5. ZIC STREET BOX ((Lionel Rascle) 

Conservatoire de Musique de Saint-Chamond - France 

SicStreetBox is an interactive equipment targeted for public demonstration and Raise 
awareness to the use of technologies for sound art creation. The songs used in the 
installation are designed by the pupils of the computer music course in Saint-Chamond 
music school. 


6. SOUND SCULPTURES (Thomas Barbe - France) 

The sound sculptures of Thomas Barbe question the link between sculptural form and 
sonorous generation. His creations situate the sound in the spatial environment in a 
tangible way. They question the link between visual form and sound activity. 
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